1 Introduction and research questions

A growing body of research deals with the representational aspects of developing learners’ conceptual learning and change, pointing out that students need to develop and understand multiple representations to improve their understanding of basic scientific concepts (Plötzner and Spada 1998; Wilhelm 2005; Botzer and Reiner 2005; Sell et al. 2006; Waldrip et al. 2006; Mortimer and Buty 2009; Taber 2009; Hubber et al. 2010; Tsui and Treagust 2013).

Here, we understand multiple representations (MRs) as various perceptually and/or conceptually different forms of external and internal mental representations, which serve as cognitive tools and support understanding as well as the use of abstract concepts (Ainsworth 2008; Gilbert and Treagust 2009; Tsui and Treagust 2013).

An essential point in science learning is that a proper understanding of the relationship between scientific experiments and phenomena, on the one hand, and of their conceptual basis, on the other hand, requires the learner to use MRs at different levels of abstraction. Examples of their use are describing experimental setups, procedures and observations with oral or written language in terms of the appropriate concepts, using related schematic or logical images such as ray or vector diagrams, or expressing experimental results with functional graphs or mathematical relations containing formal representations of the concepts in question.

Throughout various fields of science education, the importance of MRs for conceptual learning has been well established, notably in chemistry (Taber 2009), in particular for the fundamental topic of micro-macro level connection (Cheng and Gilbert 2009), in biology (Tsui and Treagust 2013) and in geo-sciences (Sell et al. 2006). For instance, in physics, Hubber et al. (2010) proved the potential of conceptual learning through multiple representations for the area of mechanics.

With regard to conceptual learning from and about experiments, cognitive conflicts with discrepant experience is considered a standard approach for fostering conceptual understanding, both for a scientific discipline (Thagard 1991) and for individual learners (Kim et al. 2002; Lee et al. 2003; Zimrot and Ashkenazi 1991). While the theoretical foundation of conceptual change is still under discussion (Galili and Hazan 2000; Schnotz 2006; Özdemir and Clark 2007), there is a strong consensus that a crucial element for conceptual change is the fact that learners deal actively with discrepant information. In other words, learners need to use their own cognitive and representational resources, as emphasized also by Hubber et al. (2010).

One strategy for engaging pupils to learn with MRs is the use of cognitively activating tasks (Lipowsky 2009, p. 93). In the present context, “cognitive activation” means that learners think more often, more explicitly, and more deeply about experiment-related representations, express and draw conclusions from them. One might say this is the case in a typical learning setting as well. However, the type of learning activity for achieving cognitive activation is new in the sense that specific tasks have been developed that require explicit reasoning about various experiment-related representations.

In the context of representation related cognitive activation aiming at conceptual change, we investigated the following questions:

  1. 1.

    Do tasks aiming at fostering conceptual change based on cognitive activation through MR learning (referred to as A) help pupils to develop a deeper conceptual understanding compared with tasks dealing with the same variety and number of intuitive concepts, but without specific MR-based cognitive activation (referred to as B)?

  2. 2.

    In order to have a baseline value, we asked additionally: Are both of these kinds of tasks A) and B) more effective than learning with conventional tasks C)?

2 Theoretical and empirical background

In order to understand how representation related cognitive activation might help learners to develop conceptual understanding, we have to analyze how learners’ cognitive processes are related to the information presented at school.

From a cognitive point of view, learners in physics classrooms deal mainly with two kinds of information sources: first, demonstrations or students’ experiments that allow them to observe or experience phenomena in real life and second, representations. The latter are objects or events that stand for the phenomenon in physics, or for a related model explaining the observed phenomenon.

With regard to processes of teaching and learning, we can distinguish between representations from at least three perspectives (see Ainsworth 2006). First, representations can differ in terms of their format; there are descriptive representations consisting of symbols and there are depictive representations consisting of icons (Schnotz 2005, p. 52). Depictive representations either represent an object or a process in a realistic manner including correspondence in terms of physical properties or in a more abstract manner displaying a structure that is analog to the subject manner. An example for a realistic representation could be a photo of an experimental set-up using a convex lens to obtain a real image of a shining object and for an abstract representation the related ray diagram that represents the construction of the image and its size. The latter representation can be also characterized as depictive schematic logic representation. Descriptive representations can be verbal or mathematical-numeric or mathematical-symbolic (e.g., the equation for calculating the scale of magnification or reduction of a real image in ray optics). Second, representations differ in their role within teaching and learning: For example, they can be presented externally on a paper or a screen or exist internally in teachers’ or learners’ minds (Cox 1999, p. 344; Schnotz and Bannert 2003, p. 143). Third, different kinds of representation are often ranked according to their level of abstraction (Leisen 1998, p. 9) e.g., the photo of the experimental setup is less abstract than a ray diagram.

According to the theoretical framework for analyzing text and picture comprehension (Schnotz and Bannert 2003; Schnotz 2005), when a learner reads a text about new learning content, for example in physics, listens to the teachers’ explanations or looks at an image, first of all, he or she forms a mental representation of the surface structure, see Fig. 1. After that, a verbal or a visual pictorial filter selects the information and forwards it to working memory. Here the verbal information leads to a propositional representation and later to the construction of a mental model. Visual pictorial information, on the other hand, leads directly to the construction of a mental model. They involve task-based selection of information obtained from the long-term memory in order to process information in the working memory (Dutke 1994, p. 76 f.) and help learners to understand an issue by simulating processes or envisioning information internally. During learning processes, mental models and propositional representations interact continuously with each other. On the one hand, the construction of mental models can be based on propositional representations guided by cognitive concepts from long-term memory. On the other hand, new propositional information can be read off from mental models in working memory and added to the former representations in long-term memory.

Learners’ concepts in long-term memory exert an influence on the construction of mental models and surface representations as well as on visual perception (Schnotz and Bannert 2003, p. 145; Schnotz 2006, p. 75 f.); as shown in Fig. 1. Within this process, intuitive concepts play a decisive role, as they can lead to incorrect conclusions when solving a task (due to inadequate mental models) and they can lead to (new) problematic concepts in the long-term memory derived from mental models.

Fig. 1
figure 1

Integrated model of text and picture comprehension (according to Schnotz and Bannert 2003, p. 145; modified by Hettmannsperger, Müller, Scheid and Schnotz)

In addition, learners’ internal representations and mental models influence the external representations created while solving tasks (Gentner and Gentner 1983, p. 118; Cox 1999, p. 354). Especially in physics, problem-solving regularly requires learners to identify or develop at least one appropriate representation of a given problem or task. To solve the problem, they then continue to operate on dealing with the representation(s) they have identified or developed. If they need several representations, this process requires learners to be systematically engaged in mutually relating different kinds of MRs at different levels of abstraction or in different formats (Ainsworth 1999, p. 142, 2006, p. 6).

While taking into consideration the mutual relations between widespread intuitive pupils’ concepts, external representations, and the construction of mental models, we can refer to the empirical results of two recent strands of research: one is learning with MRs from a cognitive psychological point of view, as outlined above, and the other is fostering conceptual understanding while dealing with MRs in science classrooms, to be outlined in the following section.

In recent years, various researchers in physics education have pointed out that dealing with MRs helps learners to improve their understanding of basic scientific concepts.

Plötzner and Spada (1998, p. 81 f.) analyzed how qualitative representations of problems can be used to solve quantitative problems in the field of Newton’s mechanics at secondary level II. Their results indicated that the number of correctly solved tasks decreased significantly when students believed in the impetusFootnote 1 concept.

Cheng and Shipstone (2003, p. 193 f.) investigated a new approach for teaching the electric circuit theory at secondary level II using different kinds of diagrams illustrating how current, voltage, resistance, and power are distributed. The authors explained how these representations supported students’ learning of basic electrical concepts and provide them with useful strategies for solving both quantitative and qualitative problems.

Wilhelm (2005, p. 175‒216) compared a treatment group that dealt with representations of two-dimensional movements on different levels of abstraction with a control group that intensively learned to interpret one-dimensional movements represented in graphs. Students in the treatment group outperformed students in the control group in tasks about the direction of acceleration and two-dimensional movements and understood concepts about directions more often than students in the control group.

Botzer and Reiner (2005) focused on analyzing the representational requirements for acquiring scientific concepts in the field of magnetism at secondary level I about magnetic fields. Students completed a number of predict-observe-explain sequences in teams, while using laboratory materials such as magnets, compasses and nails (Botzer and Reiner 2005, p. 156). They were asked to describe the phenomena of magnetism verbally as well as to illustrate their explanations in sketches. During the predict sequence students were asked to develop a model that described what would happen in the different experimental conditions. In the comment field, students developed an explanatory model of the observed results by drawing schematic representations. Students’ interactions were videotaped and all observations were complemented by field notes. Additionally, the authors collected students’ descriptions and their sketches. Using these different data, the authors analyzed the number of categories representing students’ understanding of magnetic phenomena, the relationship between visual representation and context as well as systematic patterns in the development of students’ visual representations during the learning activity. Their results indicated that students’ qualitative understanding can be supported by integrating different kinds of visual representations.

Mortimer and Buty (2009) analyzed learning difficulties while dealing with representations of the “infinite” in a teaching unit of ray optics. Representation of the infinite means that in an experimental setup for creating real images via a convex lens either the position of the image or the position of the object can approach the infinite (Mortimer and Buty 2009, p. 231). The authors videotaped episodes of 15 learning activities in a secondary II class, in which students created graphical representations of the course of light rays through a converging lens in teams. The authors considered the following learning mechanisms to be effective for explaining learning progress: first, an exchange between the world of theory and models and the world of concrete objects and events, second, switching between different forms of representation and, third, communicative exchanges which triggered learners to deal with different perspectives.

In a qualitative study of pupils’ representations in the domain of particle models about solids, liquids, and gases, Waldrip et al. (2010) showed that pupil-generated representations can support conceptual learning if the teacher fosters the clarity, coherence and adequacy of pupils’ concepts (Waldrip et al. 2010, p. 71 f.). Hubber et al. (2010, p. 23) confirmed the efficacy of using MRs in mechanics while teaching or learning the concept of force in a qualitative video study based on the observation of 12 lessons given by three teachers.

In conclusion, working with MRs explicitly fosters the development of conceptual understanding in different age groups and in various fields of physics. First, it is helpful to connect several representations that are either presented in different formats (Mortimer and Buty 2009; Waldrip et al. 2010) or can be assigned to different levels of abstraction (Wilhelm 2005; Mortimer and Buty 2009). Second, it is helpful to encourage pupils to generate their own representations and to discuss them in the classroom (Botzer and Reiner 2005; Mortimer and Buty 2009; Waldrip et al. 2010; Hubber et al. 2010).

The strategies mentioned above can be understood as a contribution to cognitive activation (Lipowsky 2009, p. 93; Baumert and Kunter 2011, p. 13). Various mainly quantitative, quasi-experimental studies in the domain of mathematics have also provided clear evidence for the effectiveness of these strategies (Hiebert and Wearne 1993; Stein and Lane 1996; Shayer and Adhami 2007; Baumert and Kunter 2011).

However, in the domain of physics, most of the studies outlined above are based on small sample sizes and mainly qualitative analyses. For this reason, further research is needed to clarify the effectiveness of these strategies. This article attempts to help close this gap.

In this context, the present study compares conceptual understanding with an approach based on cognitive activation through MR learning tasks (referred to as treatment group TG A) with the conceptual understanding of learners working on learning tasks about the same content, dealing with the same variety and number of intuitive concepts during the similar time-on-task, but without the specific MR-based cognitive activation measures (treatment group B). Furthermore, we compare both treatment groups A) and B) with a control group C) from a related second study (study II), in which pupils learn with conventional tasks without addressing conceptual obstacles. In this setting we want to explore the following research questions:

  1. 1.

    Which kind of treatment A) or B) is best suited for fostering pupils’ conceptual understanding in ray optics?

  2. 2.

    Are both kinds of treatment A) and B) more efficient for fostering pupils’ conceptual understanding in ray optics than learning with the conventional tasks C) of study II?

The sample, instructional materials, and analysis techniques are described in the present section.

3 Methods

3.1 Sample

Pupils in both studies were in their first year of physics lessons. This means that they were at the end of grade seven or at the beginning of grade eight depending on the school (M = 13.45 years, SD = 0.67). In both studies, the intervention was embedded in the regular curriculum and started when the topic of image formation by a convex lens was taught regularly in the school year. Pupils participated voluntarily in the data collection process, providing their parents had given their permission beforehand. To ensure that measurements could be related to each other, all data were pseudonymized.

The participating pupils were attending either German “Gymnasium”Footnote 2 (91 %) or comprehensive schoolFootnote 3 (9 %). The number of girls was slightly greater than the number of boys (n girls  = 371, n boys  = 358). As the study was embedded in the regular curriculum, all school classes were assigned as a whole to the different conditions. For this reason, we have a multilevel structure in both studies: measurements are nested in pupils; pupils are nested in classes, and classes are nested in schools (see Table 1). In study I, each teacher taught at least one treatment class A) and one treatment class B). As we only took the control group of study II into account, we had eight different teachers in study II. Teachers’ professional experience varied between 2–30 years in study I and 4–25 years in study II. Higher and lower achieving classes were balanced in both studies based on the average level of former school grades per class before the intervention began. This approach was applied to ensure that pupils in both studies were equal in terms of knowledge and achievement when comparing the different groups within study I and study II. However, for study II, we only took into account the control group C) in this analysis.

Table 1 Structure and size of the sample

3.2 Design and Instructional Material

For both studies, a quasi-experimental, repeated-measures design was used. The same version of the concept test (see 3.3) was applied three times. The measuring points were directly before and after the intervention (pre-post: short term), as well as 6–8 weeks later (follow-up: medium term). The covariates were the relevant school grades in math, physics, and German taken from the most recent school report. Furthermore, we investigated whether gender or class size had an influence on the development of pupils’ conceptual understanding.

Pupils in all conditions worked on tasks about forming real images using a convex lens during a sequence of six lessons (6 × 45 min) based on physics experiments. In study I, 63 % of the total intervention time was spent on learning tasks, which were different in TG A) and TG B). 37 % of the intervention time pupils did the same activities. During the same overall intervention time (6 × 45 min), pupils in CG C) worked on conventional tasks and carried out the same physics experiment as the pupils in study I. The teachers in all conditions implemented a detailed lesson plan, which the investigators handed out and discussed with the teachers before the intervention began.

Study I involved a treatment condition TG A) in which pupils learnt with representational tasks focusing on widespread intuitive concepts, enhanced by four different measures of cognitive activation (explained below). Pupils in TG B) worked on learning tasks about the same content and dealt with the same variety and number of intuitive concepts, but without the specific cognitive activation measures. Altogether TG A) and B) both adressed seven intuitive concepts, which relate to (1) problems understanding the relation between light propagation and perception as well as scattering and perception, (2) problems understanding light propagation in terms of rays as a scientific model, e.g., confusing light rays (model) and light beams (phenomenon), (3) problems understanding the emergence of virtual images using a convex lens (4) as well as problems understanding the emergence of real images using a convex lens, such as the image passing the lens as a whole, (5) if part of the lens is covered, only half of the real image appears (6) or a real image cannot be formed without a screen (7). All intuitive concepts were taken from existing research in this domain (Goldberg and McDermott 1987, p. 111 f.; Wiesner 1992, p. 16 f.; Reiner et al. 2000, p. 14).

Study II provided a control group CG C) that did not work on conceptual difficulties at all (see Scheid 2013, p. 124 f.). CG C) worked on the same content but dealt with only one representational format at a time. There were two reasons why we took CG C) as a second comparison. First, we wanted to have a baseline that would enable us to see whether both learning strategies employed in TG A) and TG B) are helpful for pupils compared with conventional tasks used in classrooms. Second, we wanted to analyze how large possible differences in learning gains might be to determine whether it is worth using these kinds of tasks in classroom practice.

Tables 2 and 3 give an overview of the different representational task formats used in study I (TG A and TG B). Table 2 shows how often different representations in different formats were used; these frequencies were largely balanced between TG A) and TG B). The small differences result from the different learning activities required (see Table 3). There were three kinds of essentially different learning activities with representations: generating own representations, dealing with provided representations, or completing representations. Table 3 shows how often these different activity formats were required of pupils. Pupils in TG A) mainly worked with self-generated representations (58 % of the time) and only 9 % of the time with representations provided by the teacher, whereas TG B) worked more with representations provided by the teacher than their own representations (44 % vs. 14 %, respectively). The quantity of representations that had to be completed (e.g., to complete a ray diagram) was 33 % in TG A) and 42 % in TG B). All instructional material was prepared by the authors in the form of worksheets and overhead transparencies, and validated in a feedback loop with the participant teachers.

Table 2 Percentage of time spent on generating own representations, dealing with provided representations, or completing representations. 100 % refers to the amount of time that TG A) and B) spent on different tasks (63 % of the total intervention time)
Table 3 Kinds of representations per condition: 100 % refers to the number of representations that play a role in the time spent on working on different tasks in TG A) and TG B) (63 % of the total intervention time)

One main difference between TG A) and TG B) was the extent to which pupils generated representations themselves, which in fact was one of the cognitive activation strategies to enhance the representational learning activities. We now turn to a more detailed description of these strategies. As explained before, seven different learning difficulties related to common intuitive concepts were addressed; however, not all of the tasks dealt with an intuitive concept at the same time. In fact, this was the case for 56 % of the tasks that differed between TG A) and TG B).

Type 1: Pupils in TG A) were explicitly asked to create their own representations and to reflect on them; Hiebert and Wearne (1993, p. 419), Stein and Lane (1996, p. 71) as well as Shayer and Adhami (2007, p. 274 f.) reported on the effectiveness of this strategy in the field of mathematics. For example, in one task, pupils in TG A) were asked to create a sketch of the experimental setup used to create real images via a convex lens, whereas pupils in TG B) worked on a given sketch of this set-up and were asked to name the objects and the related optical quantities.

Type 2: For another strategy, pupils compared their own representations with a given solution, to reflect the own view and to adapt the own reasoning when necessary (see Hiebert and Wearne 1993; Baumert and Kunter 2011); these steps were supported by a whole-class discussion. For instance, in order to learn how to construct a ray diagram, TG A) was asked to complete a partly given solution. In the next step, they compared their own ray diagram with a worked out solution. In order to provide a scaffold for self-generation, pupils worked in three steps (with a different image size in each case) on a kind of representational fading-out task: First, 50 % of the ray diagram was presented and 50 % had to be completed; second, only 25 % of the ray diagram was presented and 75 % had to be completed; third, the ray diagram had to be constructed entirely. In each case, pupils went through the reflection process and compared their solution with the correctly worked out solution, which was handed out to the pupils afterwards. Pupils in TG B) were informed about how to construct a ray diagram first before they had to construct a ray diagram themselves. Furthermore, they measured all optical quantities, which is a conventional standard task. In another task of this type, the pupils dealt with the intuitive concept that only part of the image would appear if the lens was covered; see intuitive concept (6). Whereas pupils in TG A) constructed a ray diagram themselves, TG B) worked with a given ray diagram that showed that the entire image appeared on the screen.

Type 3: This cognitive activation strategy consisted of asking pupils to connect different kinds of representations to each other and to translate one representation into another (Hiebert and Wearne 1993; Stein and Lane 1996; Baumert and Kunter 2011). For example, in another task related to the example for type 2 (construction of a ray diagram), pupils had to complete a verbal formulation of one of the underlying rules for this construction (relating focal length as well as image, object sizes and distances). This process required them to relate the underlying experiment, the ray diagram and a descriptive representation as a third representational format in TG A).

Type 4: This type of task focused on dealing flexibly with a given representational format (Hiebert and Wearne 1993; Stein and Lane 1996; Baumert and Kunter 2011) and required pupils to create an internal mental model of the situation (Gentner and Gentener 1983, Mortimer and Buty 2009). For example, TG A) was asked which observer could see an image of a candle in an experimental setup when a candle (here represented by an arrow) was placed in front of a convex lens (see Fig. 2) (a) if someone put an opaque screen in position S, (b) if the opaque screen was replaced by a transparent one, and (c) if the screen was removed (referring to observers A, B, and C, respectively, in Fig. 2)Footnote 4. Many pupils believed that the screen was necessary to “capture” the image, which is a widespread intuitive concept (Goldberg and McDermott 1987, p. 114). They did not consider that it would still be possible to see the so-called aerial image from an observer position where the light bundle emanating from the image points would hit the eyes. Moreover, TG A) was asked to work with the ray diagram in Fig. 2 in order to explain which of the observers (A–C) could see the optical image on the screen. Projecting themselves into the observer’s perspective required pupils to develop a mental model of the experimental situation to answer the question as to what the observer could see. At the same time, TG B) was asked where the image would form and from where it would be visible. Hence, the same conceptual difficulty, see intuitive concept 7), was addressed, but pupils were not required to operate on the ray diagram and the observers (A–C) were not shown in the provided representation (see Fig. 3).

Fig. 2
figure 2

aSchematic representation of a real image formed by using a convex lens in TG A). (a F = focal point, G = “Gegenstand” (object, here a candle)

Fig. 3
figure 3

aSchematic representation of a real image formed by using a convex lens in TG B). (a F = focal point, G = “Gegenstand” (object, here a candle)

In another task of type 4, pupils in TG A) were asked to imagine what happens to the size and the position of a real image formed by convex lens, if the object, a candle, is moved towards to or away from the lens. This task required them to create a mental model of the situation for a “mental simulation”. Pupils in TG A) were told to display their reasoning graphically. In the next step, they were asked to imagine an extreme case in which the candle was placed at a very far distance and to outline the effects on the image distance. At the same time, TG B) worked on two ray diagrams; they were asked to construct the image of a magnified candle light in a first setting (strengthening routine procedures) and of the focal point in another setting (a more challenging task). Thus, TG B) dealt with a problem-solving task as well and also worked on ray diagrams. However, there was no requirement to simulate changes and dependencies in a mental model.

During the identical intervention time, the pupils in CG C) were also exposed to the same representations (such as the photo of the experimental setup, ray diagrams or the related equation, etc.). Their lessons included the same experiment, but they neither worked simultaneously with more than one representation nor dealt with intuitive concepts. Instead, CG C) solved tasks related to image construction with principle rays (see Fig. 4) or calculated a missing quantity when three quantities were known (see Fig. 5) (both conventional tasks). A detailed description of all tasks in CG C) can be found in Scheid (2013, p. 96 f.).

Fig. 4
figure 4

a Conventional tasks in CG C) demanding the construction of a ray diagram (depictive logic, schematic representation). (a f  focal length, G  object size, B image size, g distance between object and lens, b distance between image and lens)

Fig. 5
figure 5

aConventional tasks in CG C) requiring pupils to deal with mathematical, descriptive, symbolic, and numeric representation(s). (a f focal length, G object size, B image size, g distance between object and lens, b distance between image and lens)

3.3 Analysis techniques

In order to assess pupils’ conceptual understanding, a concept test that aimed at capturing conceptual knowledge in this domain was developed and used in both studies. Conceptual knowledge is defined as the knowledge of core concepts of a domain as well as an understanding of the mutual relationships among these core concepts (Byrnes and Wasik 1991, p. 777). The test was designed as a multiple-choice test and included 11 items, each of which had a scientifically correct answer and three distractors as answer options (see Table 4 for example items). These distractors were based on widespread intuitive concepts of pupils reported in the literature (Goldberg and McDermott 1987, p. 111 f.; Wiesner 1992, p. 16 f.; Reiner et al. 2000, p. 14). The testing time was 15 min.

Table 4 Items used in the concept test

To draw a conclusion from the comparison, we had to fulfill the following prerequisites:

  1. 1.

    We had to verify whether the concept test outlined above was a reliable and valid testing instrument for the evaluation of pupils’ learning outcomes.

  2. 2.

    Pupils in all conditions had to start with the same level of prior knowledge.

In addition, when carrying out the comparison, the hierarchical structure of the data had to be taken into account: Measurement times were nested in pupils; pupils were nested in classes; classes were nested in schools. For this reason, the data were analyzed by multilevel analysis. All statistical calculations were conducted with the statistics software R, especially the package nlme (Pinheiro and Bates 2013). In the following analysis, only a 3-level model is considered, as the sample size of N = 29 on the class level is almost appropriate for multilevel modeling, but not the sample size on school level (N = 18); see Eid et al. (2011, p. 705).

The effect size Δ described by Tymms (2004) was used to judge the relevance of significant effects. In analogy to Cohen’s, Δ = 0.20 is considered a small effect, Δ = 0.50 an medium effect, and Δ = 0.80 a large effect (see Bortz and Döring 2005, p. 568). According to Tymms (2004, p. 56 f.), Δ can be calculated by dividing the difference of the requested means by the pooled standard deviation.

Missing data were handled as follows: (i) In terms of missing values on the level of individuals and measurement points, if all covariates were available, data of pupils, who were present for at least one measurement were taken into account. For instance, if all grades were known, but the pupil did not attend the pretest, data from the post- and follow-up tests could be included in the statistical model. For this reason, the sample size varied between the pre-, post-, and follow-up tests. (ii) In terms of missing values on the level of items in the concept test, unanswered questions were scored as 0 as long as pupils had attended the respective pre-, post-, or follow-up tests.

School grades and class size were grand-mean centered. After centering, the intercept can be interpreted as the expected value of the dependent variable of an average student who is in the group, in which all dummy coded categorical variables assume the value 0 (here sex = female, school type = comprehensive school, and condition = CG C). Another advantage of centering is that in multilevel models, the variance of intercepts can be interpreted (Hox 2010, p. 62).

4 Results

To ensure that the first prerequisite was fulfilled, the concept test instrument was analyzed using data from all pupils in study I and study II (N = 988). The internal consistency as an estimator for reliability was satisfactory (α C  = 0.75). A further validation analysis was carried out (e.g., cross-validation of explanatory and confirmatory factor analysis, see Hettmannsperger 2015, p. 207 f.), leading to three clearly interpretable dimensions: “understanding image formation while a part of the lens is covered” (referring to one of the most widespread intuitive concepts), “understanding basic concepts of light propagation and scattering” as well as “interpreting light according to the ray model and understanding ray diagrams” (see Table 5).

Table 5 Subscales derived from the cross validation of explanatory and confirmatory factor analyses

With regard to the second prerequisite, results in Fig. 6 show that pupils in the three conditions did not enter into the quasi-experimental activities with different levels of knowledge; we did not find any significant differences in the pretest.

Figure 6 as well as Figs. 8 and 9 should be interpreted as follows. The direction of each arrow describes the difference between the group at the starting point (tail) of the arrow and the head of the arrow. Positive coefficients show that the group at the arrow head outperformed the group at the starting point of the arrow; this means that the difference from the starting to the end point (arrow head) has to be added. Negative coefficients indicate the opposite effect: The group at the arrow head improved less than the group at the starting point of the arrow.

Fig. 6
figure 6

Comparison between all three conditions TG A), TG B), and CG C) in the pretest, TG (treatment group), CG (control group)

In the first step of the actual analysis, the descriptive results (see Table 6) indicated that the pupils in both studies might have improved their conceptual understanding (main effect of time).

Table 6 Descriptive statistics of the concept test results

Descriptive results shown as box plots (see Fig. 7) indicate that pupils in both treatment groups in study I might have improved more than those in study II. Moreover, the descriptive results and the box plots suggest that this effect was stable with regard to the follow-up test.

Fig. 7
figure 7

Box plots comparing the results of the concept test between all three conditions TG A), TG B), and CG C), TG (treatment group), CG (control group). (*outliers)

In the second step, a multilevel analysis was performed to find out whether the trends in step 1 were significant. The values of the intra-class correlation (ICC Level−3  = 0.21 and ICC Level−2  = 0.53) indicated that a multilevel approach was indeed appropriate for analyzing the data (see Eid et al. 2011, p. 705).

The results in Table 7 show that grades in mathematics and physics were significant covariates. German grades, class-size, and gender were not found to have any significant influence on conceptual understanding. Furthermore, there was a significant main effect of time: pupils improved their understanding overall in both the pre − post and the pre − follow-up comparison (see Table 7).

Table 7 Results of the Multilevel Analysis, β (standardized coefficient), SE (Standard Error)

To find out which of the three conditions was best, we had to analyze the interaction between time and condition in step 3.

With regard to our first research question, which involved determining which kind of treatment was the most effective (addressing pupils’ intuitive concepts via representational cognitive activation in TG A) versus addressing them with learning tasks without this instructional measure TG B)), we did not find any significant differences. Pupils in TG A) did not gain more or less knowledge pre—post (β = − 0.10, SE = 0.09, F (1,249) = 1.42, p = 0.233) and pre—follow-up (β = − 0.07, SE = 0.08, F (1,129) = 0.71, p = 0.400) than pupils in TG B) (see Fig. 8 and Fig. 9).

With regard to the second research question, the assumption that learning in TG A) and TG B) was more effective than learning with conventional tasks in CG C), we found significant effects in the expected direction. Pupils in TG A) improved more pre—post (β = 0.57, SE = 0.10, F (1,1249) = 31.67, p < 0.001, Δ = 0.68) as well as pre—follow-up (β = 0.53, SE = 0.10, F (1,1249) = 29.61, p < 0.001, Δ = 0.63) than CG C). The same was true for pupils in TG B) pre—post (β = 0.67, SE = 0.10, F (1,1249) = 43.90, p < 0.001, Δ = 0.80) as well as pre—follow-up (β = 0.60, SE = 0.10, F (1,1249) = 36.86, p < 0.001, Δ = 0.72). The values of Δ even suggest that the lessons in TG B) were more effective than in TG A). However, this effect was not significant (see Fig. 8 and Fig. 9).

Fig. 8
figure 8

Pre − post comparison between all three conditions TG A), TG B), and CG C), TG (treatment group), CG (control group)

Fig. 9
figure 9

Pre − follow-up comparison between all three conditions TG A), TG B), and CG C), TG (treatment group), CG (control group)

In summary, the results in Fig. 8 indicate that with regard to the pre−post learning increase, the pupils in sample I (TG A and TG B) improved significantly more than pupils learning with conventional tasks (CG C). The same was true for the pre − follow-up comparison. In each case, we found a statistically highly significant increase of around half a standard deviation and a medium effect size Δ. However, we did not find any significant difference between treatments A) and B) in study I. The box plots in Fig. 7 and the values of the coefficients β (see Fig. 8 and Fig. 9) display a small trend that the pupils in TG B) improved slightly more pre—post and pre—follow-up than the pupils in the TG A). As mentioned above, this effect was not significant.

5 Discussion

The comparison between TG A) and B) (research question 1) revealed that addressing conceptual difficulties via cognitive activation through learning with MRs is as effective as addressing them with learning tasks without this instructional measure. On the one hand, this is good news as multi-representational reasoning requires an additional cognitive activity and thus creates not additional cognitive load, which can be potentially harmful for learning. On the other hand, in view of the general potential of representational reasoning for science learning, one might have expected TG A) to attain a better conceptual understanding than TG B). There might be at least two possible explanations for this finding.

First, the difference between both kinds of treatment A) and B) might have been too “small” in terms of cognitive activation, as addressing pupils’ intuitive concepts can be seen as a kind of cognitive activation itself (Lipowsky 2009, p. 94; Baumert and Kunter 2011, p. 13).

Second, some of the tasks for TG B) required pupils to develop a mental model of the situation in order to answer the question. Developing a correct mental model of the situation implicitly also requires pupils to develop conceptual understanding and to deal with representations to a certain extent. The difference between TG A) and TG B) is therefore that representational reasoning on intuitive concepts is explicitly required in the former case, while it might be implicitly needed in the latter. Again this would lead to a too small difference between the treatments.

The comparison of both treatments A) and B) with the CG C) (research question 2) indicated that pupils learning with tasks addressing widespread intuitive concepts gained a better conceptual understanding than pupils in the control group. These differences were in each case highly significant and of practical relevance (medium effect size). Moreover, they remained temporally stable beyond a short-term effect (6–8 weeks).

In summary, the question regarding the possible difference between the two treatment forms cannot be answered in the present study. However, we may conclude that addressing intuitive concepts with (multi-)representational reasoning was performed as an instructional measure for fostering conceptual learning in a usual physics classroom. First, the comparison to addressing the same intuitive concepts without explicit representational reasoning showed that the extra requirement did not constitute a harmful cognitive load, and second, the comparison pre ‒ post (and pre ‒ follow-up) showed reasonable practical relevance (medium effect size). Even though the finding that an explicit strategy improved conceptual understanding might be not very surprising, we have to keep in mind that overcoming widespread intuitive concepts is notoriously difficult and classical strategies such as inducing cognitive conflict by showing demonstration experiments do not automatically lead to success (Limòn 2001; Vosniadou 2013). Compared to the range of results about conceptual change strategies known from meta-analysis (d ≤ 0.6; Hattie 2009), the result of the present approach turned out to be at the upper end. Thus, if an educational objective such as conceptual change is rather hard to achieve, also moderate steps towards improvement are welcome. As reasoning with MRs seems essential also for a series of other objectives in science education, it is a promising state of affairs that positive effects on conceptual understanding are among the benefits of this instructional approach and can possibly be combined with this kind of cognitive activation measures.

Future investigations could try to combine conceptual learning with other objectives of representational learning in science such as developing a coherent understanding of an experiment and the various representational formats related to it. Moreover, it would be interesting to compare the use of self-generated representations and representations provided by teachers in more detail. In view of the multiple functions of MRs, such attempts might be of interest for classroom practice and might help to clarify which kind of cognitive processes can be enhanced by which kind of representational learning activity.