Reading skills are important in almost all school subjects. Especially in science subjects, the reading material most commonly consists of texts with instructional pictures (Mayer 2001), for example pictures with a mainly informative instead of illustrative function. This text-picture reading material (TPRM) can be easier to understand than simple texts (e.g., Carney and Levin 2002; Leopold et al. 2015), but can also be challenging for students, as they have to relate information from both sources, text and picture, to each other and integrate them into coherent knowledge structures (e.g., Ainsworth 1999; Schnotz et al. 2014).

As teachers are responsible for their students’ learning, the question is how teachers can support students to overcome problems in reading TPRM and if they are successful doing this. What makes it difficult for teachers to give good instruction for text-picture reading is the fact that they often do not learn in teacher education programs how to support students’ understanding of TPRM (McElvany et al. 2012). A second question is if different groups of students react differently to this support and thus need different help. For example, especially low-ability students tend to process pictorial information in TPRM superficially and spend too little time on relevant parts of the picture (Hannus and Hyäno 1999; Schnotz et al. 2014). Further, it was proved to be more difficult for novices, that is, persons with low prior knowledge, than for experts to interrelate different forms of representations to each other (e.g., Kozma and Russell 1997).

Concerning the first question, it was shown that explicit picture-oriented instruction in TPRM or cognitive aids (Mautone and Mayer 2007) as well as practicing (e.g., Philipp 2008) can ameliorate knowledge about instructional pictures like diagrams and foster a better usage of pictorial information in TPRM. The same should be true for the ability to relate text and pictorial information to each other.

Thus, aspects of instructional quantity and quality can affect students’ text-picture reading skills. Important aspects of this type of teaching are the frequency with which teachers give students opportunities to practice interpreting texts with pictures, whether they explicitly discuss the pictures instead of avoiding this, and whether they make efforts to help all students understand the TPRM. Another aspect of instructional quality is the adaptation to heterogeneous student groups that may react differently to their instructional support.

However, there is hardly any research about the effects of common teaching practice on the improvement of students’ ability for text-picture integration. The present study analyzes by means of teachers’ self-reports if the frequency of using TPRM in class—as an aspect of teaching quantity—and the tendency to explicitly discuss the picture as well as a special engagement of teachers for their students’ learning—as aspects of teaching quality—predict the development of their students’ competence of text-picture integration.

Additionally, we investigate if low-ability students, who need the help of their teachers most, profit in the same way from the teachers’ support. According to the Matthew effect (Rigney 2010), students with better preconditions should show the larger gain of competence. This study was implemented in an ecological approach with a quasi-experimental design where the main independent variable was not manipulated but just observed in real classrooms in order to allow for an optimized external validity of the research results.

Theoretical framework and state of research

Learning with TPRM

Schnotz and Bannert (2003, see also Schnotz 2002) suggest two different branches for text and picture processing. They assume two fundamentally different classes of representations: descriptive ones consisting of symbols and depictive ones being iconic signs (see Schnotz 2002). In the descriptive branch, an internal mental representation of the text surface structure is built, representing syntactic and morphologic characteristics of the text, from which a propositional text base is constructed that represents the semantic content. In the depictive branch, an internal visual perception or image of the picture is built, from which a mental model of the subject matter is constructed (Schnotz 2002). Both representations include prior knowledge and interact with each other.

To integrate text and picture information, mapping processes relate verbal and pictorial elements to each other both on the surface structure level as well as on the level of mental model and propositional representations.

The ability of text-picture reading is a cultural skill that needs to be learned (see Ullrich et al. 2012). It is typically a metacognitive process. Metacognition means “thinking about thinking” or “cognition that reflects on, monitors or regulates first order cognition” (Kuhn 2000, p. 178). It was first described by Flavell (1979) and can be divided into metacognitive knowledge (e.g., about persons, tasks, and strategies) and metacognitive control or regulation, including planning (to select appropriate strategies), monitoring (to be aware of comprehension), and evaluating (the appraisal of “products and regulatory processes”; Schraw et al. 2006, p. 114). According to the described theories of learning with TPRM, students need to be aware of, control, and improve the following cognitive processes to ameliorate text-picture integration skills:

  • Students need to read the text and inspect the picture. As they do this, they need to organize the interleaving of reading the text and inspecting the picture. This process is reflected in eye movements from the text to the picture and vice versa. Hegarty et al. (1991) found, that in a 17-clause technical text describing a complex pulley system, subjects interrupted their reading to inspect the picture about once every three or four clauses. Problems in text-picture integration may arise if students do not pay attention to relevant parts of the text and the picture or if they consider the picture too briefly to extract essential information (Hannus and Hyäno 1999).

  • Students need to construct referential links between verbal and visual representations. When texts and pictures are physically separate, learners need to “hold segments of text in working memory” as they search for the matching picture entity (Kalyuga et al. 1999, p. 352). This split in attention increases cognitive load and reduces comprehension of the TPRM (split-attention effect; Sweller 1999). Common color coding, numbers, and labels can help to reduce the cognitive load. If students do not consider these factors, they may not be able to map surface structure representations onto each other (Schnotz et al. 2014).

  • Learners need to mentally integrate the information extracted from the two media. For example, they have to create a mental model. This representation includes prior knowledge, and its development is based on activating cognitive schemata. For this reason, for example, a lack of knowledge about specific graph conventions may lead to a model that is inconsistent with the picture, which may cause the deep structure mapping of the text to the picture to fail. Furthermore, if identical information is presented in a text and a picture, it can cause cognitive load and make integration of redundant information difficult (redundancy effect; see Mayer 2005).

Thus, although instructional pictures can improve learning and understanding, comprehension can be “effortful and error prone” (Shah and Hoeffner 2002, p. 48). The split-attention effect and the redundancy effect mentioned above are examples of possible negative effects of multimedia learning on comprehension (Mayer 2005).

To conclude, students need the support of their teachers to be able to use instructional pictures effectively (Bartholomé and Bromme 2009; Coleman et al. 2011).

Teaching with TPRM

There are several options how to support students’ text-picture integration skills. Many studies have focused on how to design TPRM to improve students’ comprehension. Criteria that were found to improve students’ understanding were reviewed by Schnotz (2002) and Shah and Hoeffner (2002). Scaffolding and cueing techniques in TPRM that can direct learners’ actions (such as when to consider a specific graphic or how to integrate cues in TPRM) were also found to be useful for students’ understanding (see Hannus and Hyäno 1999; Jamet 2014; Mautone and Mayer 2007; Seufert 2003). In this study, we focus on the support of students through given learning materials such as TPRMs in school books, which are the most frequently used materials in many instructional situations. We analyzed good teaching practices concerning learning with TPRM. These practices should be measured by their effects on the development of text-picture reading skills. For this reason, the development of these skills comprises the main content of this article. As previously mentioned, teachers have not learned, how to foster text-picture reading, in their education classes, making it unclear, whether they are successful in developing their students’ text-picture reading skills.

Although very few studies have focused on strategies for text-picture integration instruction, some strategies can be derived from the theory about fostering metacognition and from empirical research concerning teaching with TPRM. Generally, quantity and quality of instruction can be differentiated (Lipowsky 2006). Quantity of instruction has been seen as important, as it entails providing students learning opportunities in the sense of time-on-task (Clausen 2002). Indirect teaching of metacognitive skills is related to this concept, as it implies the arrangement of a supportive learning environment in which students are prompted to practice metacognitive strategies (Vacca 2002). It also contains frequent opportunities to practice routines of cultural conventions for pictures (Koerber 2011) as well as the learning processes presented in the “Learning with TPRM” section.

There is not much research about instruction regarding text-picture reading, but it seems that current teaching practices have used rather indirect methods of learning with TPRM. In US elementary schools science instruction, Coleman et al. (2011) found that pointing to graphics in books was the most frequently reported instructional practice, with more than 90% of primary school teachers using this method frequently or at least sometimes.

However, providing frequent learning opportunities alone is not sufficient. Instructional quality, in the sense of direct instruction toward successful development of students’ text-picture reading skills, is needed. In order to foster metacognition, it is possible to directly teach learning strategies or cognitive knowledge (Kistner et al. 2010). This method entails implicitly prompting or modeling the use of a strategy (Dignath-van Ewijk and Van der Werf 2012; Kistner et al. 2010) or explicitly providing students with knowledge, which strategies to use, and pointing out, explaining, or discussing the benefits of metacognition. Kistner et al. (2010) found that a great amount of teaching strategies was done implicitly, however, whereas explicit teaching strategies were rare.

To develop text-picture reading, direct teaching can guide students’ processes of information selection and referential links construction as well as to foster students’ mental integration of information from texts and pictures into a coherent mental model. This may imply prompting specific strategies or behaviors to draw learners’ attention to specific parts of pictures such as labels or colors or to encourage students to “do something with the picture” (Peeck 1993). For example, students could integrate words and pictures mentally or externally by labeling or completing graphs or create their own pictures or graphs. In German elementary schools, the most prominent teaching strategies did not include explicit direct teaching but rather implicit ones such as selecting relevant information from text and picture (Ohle and McElvany 2016).

Because pointing to pictures and selecting relevant information were the most frequent methods of teaching with TPRM, explicit discussion of pictures in TPRM as a precondition of supporting the construction of referential links or of a mental model did not seem to be obvious and could be regarded as an aspect of teaching quality (Oerke et al. 2018).

If pictures are discussed, the questions are how intensively teachers do it and how much efforts they take to support their students’ understanding of TPRM. According to Stylianidou et al. (Stylianidou et al. 2002, p. 257, see also Coleman et al. 2011), teachers should “spend time and effort” talking through the meanings of images. Because caring teacher behavior is part of a supportive classroom climate, which is a general aspect of teaching quality (e.g., Klieme et al. 2009), making efforts toward student learning can be regarded as a second aspect of teaching quality. An indication of positive effects of teachers’ efforts on students’ learning with TPRM was revealed by Schroeder et al. (2010), who reported a positive relationship between adaptive explanations by the teacher as well as their belief in teaching clear strategies and students’ self-reported engagement in text-picture integration.

To our knowledge, quantity and quality of current teaching practices toward TPRM have only been analyzed in elementary schools but not in secondary school. The effects of these instructional practices on the development of learners’ text-picture reading skills are not known. In this study, we analyzed the relationships between teachers’ practices and the development of students’ text-picture integration skills. In doing so, we focused on the general skill of relating a picture to a text and integrating the information from both sources. To measure teaching quantity, we analyzed the frequency of TPRM use in class; to measure teaching quality, we examined whether pictures were skipped or discussed explicitly and the efforts that teachers reported investing in their students’ skills to interpret and understand TPRM. We decided explicitly for an ecological approach, not manipulating teachers’ behavior in an experiment, but asking teachers about their real behavior in class, in order to get an impression about the impact of teachers’ behavior on students’ development of text-picture reading skills in current educational practice in classrooms.

For our first research question, we asked (Research question 1): Is the development of students’ text-picture integration skills related to instructional quantity (frequency of discussing texts with pictures) and quality (explicit discussion of pictures and efforts toward student learning) in science and language arts pedagogic practice?

We expected instructional quantity and quality at the class level to positively predict the development of students’ text-picture integration ability (Hypothesis 1a).

Several authors have called for studies of the meanings of pictures in the context of science teaching (Shah and Hoeffner 2002). Because science textbooks include many instructional pictures (Mayer 2001), we analyzed the effects of instructional practice on students’ text-picture reading development especially for science teachers. We chose biology and geography teachers, as they have tend to use instructional pictures for their lessons most frequently (56 and 59% of the lessons, respectively). We compared them with German language teachers as a reference group, who used TPRM much less frequently (about 22% of the lessons; Schroeder et al. 2010; McElvany et al. 2012) and should have much less impact on their students. So, we expected the effect of instructional quantity and quality at the class level to be larger for science teaching than in German language arts teaching (Hypothesis 1b).

Individual differences: prior knowledge and cognitive ability

Knowledge and cognitive ability of learners can influence the learning gain from texts with instructional pictures. In the field of animation, the ability-as-compensator hypothesis is the idea that visual information (e.g., animation) could support learners with low spatial ability “because they are provided with an external representation of a process that helps them to build an adequate mental model” (Höffler and Leutner 2011, p. 210). Höffler and Leutner (2011) found results that were in line with this hypothesis. The hypothesis can be transmitted to learners who have low prior knowledge of the content and low verbal skills (although they need to have at least basic reading skills; see Carney and Levin 2002). These students can benefit more than others from the addition of pictures to texts; on the other hand, learners with high domain knowledge do not depend as much on pictorial support when constructing mental models (Mayer 1997; Mayer and Gallini 1990).

On the contrary, the ability-as-enhancer hypothesis is the idea that learners with high spatial ability can profit from visual information (e.g., animation) and that learners with low spatial ability might not profit from it (Höffler and Leutner 2011; Huk 2006). The underlying idea is that visual information increases the cognitive load more so for low-spatial-ability learners than for high-spatial-ability learners.

Plass et al. (2003) also reported support for this hypothesis, finding that visual annotations for vocabulary words increased cognitive load, especially for low-verbal-ability learners, while high-verbal-ability learners performed better. This phenomenon, in which the rich get richer and the poor get poorer, has been termed the Matthew effect (Rigney 2010) and has been described as applying to the development of reading skills, among other areas (Stanovich 1986).

The focus of our second research question is on whether, in text-picture integration, low- or high-ability learners benefit more from teachers’ help. This topic has not previously been analyzed, although Schnotz et al. (2014) hinted at it, reporting that students with better learning skills were better able to adapt their picture usage to item difficulty. This supports the assumption that low-ability children need help from their teachers if they are to benefit from texts with pictures. Teachers can contribute to the learners’ text-picture integration strategies so that these learners can profit from the potential ability-as-compensator effect.

Some researchers have corroborated the ability-as-compensator hypothesis, however. Seufert (2003) analyzed the effects of directive help in TPRM and found that learners with low prior knowledge did not profit from directive help, perhaps because that help increased their cognitive loads. Learners with medium prior knowledge, however, increased their comprehension performance when help was offered, perhaps because their cognitive loads were lower even though they still had a need for help.

Thus, support for both hypotheses has been found, and there is a research gap concerning the extent to which students with various skill levels benefit from teachers’ support. We try to reduce this gap by determining whether a Matthew effect can be found for verbal cognitive ability and prior skills in text-picture integration. Thus, we try to find out if students with higher cognitive ability and those with higher prior text-picture integration skills profit more from teachers’ efforts when compared to students with lower abilities (i.e., those who need teachers’ help the most). We focus on verbal cognitive ability, as Philipp (2008) found that spatial cognitive ability has only a small effect (or no effect) on text-picture integration skills.

From the perspective of statistics, we ask the following (as Research question 2): Does instructional quantity or quality (at the class level) interact statistically significant with students’ preconditions (verbal cognitive ability and prior text-picture integration skills)? We expect that, per the Matthew effect, both verbal cognitive ability and prior text-picture integration skills have significant interaction effects with instructional quantity and quality, such that more skilled students profit more from teacher instruction than less skilled students do (Hypothesis 2).

Methods

Design and samples

The data we used in this study came from the longitudinal study BiTe (Development and assessment of competence models for an integrative processing of texts und images) which was funded by the German Research Foundation (DFG). The purpose of the project was to analyze students’ text-picture integration skills, teachers’ competencies as well as instructional quantity and quality with regard to the instruction with TPRM. We took a random sample from the total population of all secondary schools in Rhineland-Palatinate, Germany. This produced a sample of 48 schools of the three most common secondary school tracks in the German school system: basic or lowest track (Hauptschule), more extensive or middle track (Realschule), and intensified or highest track (Gymnasium).

For each school, we randomly drew two classes, including two student cohorts of grades 5 and 6 and their teachers, and analyzed them at three measurement points in grades 5–6–7, respectively 6–7–8 (see Ohle et al. 2017). Of this sample, we chose the students of grade 5 (first cohort) and grade 7 (second cohort), using data from two successive years to analyze the impact on students’ skill development.Footnote 1 Because it was feasible in several classes to analyze the impact of science teachers’ and German language teachers’ instruction on the development of students’ text-picture integration skills, we included one German language teacher and one science teacher for each class when possible. For science, we contacted both the biology as well as the geography teacher in the first step. This was not always possible, however, so that at the end, one to three teachers per class were included. To ensure better contrast between the subjects, we took only those German teachers into account that currently taught the respective class in German language, had studied German language (or no specific subject) but not biology or geography, and had taught German language arts as a main subject in the previous 3 years. This reduced the number of German teachers to 26 (see Table 1).

Table 1 Teacher samples (total and separate for taught school subject) and number of classes and students taught by them

As science teachers, we included biology and geography teachers who had either studied biology or geography or had taught the subject in the previous 3 years. If two science teachers taught both biology and geography in the same class,Footnote 2 we selected one of them at random. Table 1 shows the number of teachers and respective students who completed questionnaires or tests in this study. Because it was possible in several but not in all classes to include one German teacher and one science teacher, the total number of classes and students was smaller than the sum of the German and the science sample (i.e., the total of the classes was 37, which was less than the sum of 26 and 30), but larger than half of this sum (28 classes). So, the students of the German language sample and the science sample were only partly identical. However, statistical tests showed that the two student samples did not deviate significantly from each other concerning important sample characteristics.Footnote 3 After excluding some students with missing values for verbal cognitive ability (German sample n = 3, science sample n = 4) and one extreme value with a distance of more than three standard deviations from the mean, N = 504 students remained who were taught German and N = 580 who were taught biology or geography by the respective teachers. About half of them were female (both subjects 53%), and their mean age was about 11 and a half years (German SD = 0.85, science SD = 0.84). About two thirds were grade 5 students (German 4%, science 61%) and 36% (German) and 39% (science) were grade 7 students at measurement time 1.

The sample of German language teachers consisted of five teachers for the basic school track Hauptschule (19%), 11 teachers for the middle school track Realschule (42%), and ten teachers for the highest school track Gymnasium (39%). Most of them (M = 81%) were female, 23 of them had studied German language, and three of them had studied no specific subject. The mean age was about 41 years (M = 41.13, SD = 10.59) and they had been a teacher on average for about 13 years (M = 13.15, SD = 10.76). The sample of biology and geography teachers did not differ significantly from the German teacher sample concerning school type (χ2 = 1.53, ns) and percentage of female teachers (73%, χ2 = 0.43, ns). The mean age (M = 41.45, SD = 12.13) and the average teaching experience (M = 13.97, SD = 11.97) also did not differ significantly according to t tests (age: t(51) = − 0.10, ns, teaching experience: t(54) = 0.27, ns).

Measures

Outcomes

For the purposes of measuring text-picture integration skills, students completed a test that was generated and validated in a preliminary study, based on the theoretical model of Schnotz et al. (2010); for more details, see, e.g., Ullrich et al. 2012; Hochpöchler et al. 2013). It was developed for secondary school students of grades 5 to 8, with higher item difficulty in higher grades. In this test, students received eight of 48 tasks containing a text and a picture (including maps, pie charts, bar charts, flow charts, schematic drawings, and others). Both texts and pictures were necessary to answer six multiple-choice items with four response options each that corresponded to Wainer’s (1992) three different levels of deep structure mapping. An exemplar task for grades 5 and 6 is shown in Fig. 1. The items were analyzed based on item-response theory with a Rasch model. The internal consistency of the text-picture integration test for the total sample of grade 5 students was high with α = 0.92 at the first and α = 0.90 at the second time of measurement. The same is true for the grade 7 students with α = 0.89 at the first and α = 0.88 at the second time of measurement.

Fig. 1
figure 1

Exemplar task: structure of a plant cell

Predictors

Students’ verbal cognitive ability was measured at the first measurement time in order to serve as a control variable as well as to determine the interaction with teacher behavior.

Students completed the verbal thinking subtest of the cognitive ability test by Heller and Perleth (2000), see Table 2), consisting of 20 items.Footnote 4

Table 2 Descriptive statistics for instructional quantity and quality as well as student measures separate for biology or geography and German language

To determine instructional quantity and quality, teachers completed questionnaires at the first measurement time. Because we wanted to predict the development of students’ behavior, it was necessary to measure teachers’ instruction at the starting point of measures. We assumed that most of the teachers did not change their behaviors in meaningful ways within 1 year. All instructional behavior scales consisted of four items each. In the teaching quantity items, the teachers were asked how often several things happened in their class (e.g., “We discuss a text that also contains a picture.”). The scale ranged from 1 (never) to 6 (very often). In the two instructional quality scales, the teachers were asked about their behavior when a picture is integrated into a text. In the four explicit picture discussion items, they rated on scales from 1 (I do not agree) to 4 (I strongly agree), if they discussed the picture or avoided it (e.g., “I avoid the discussion of the picture, if possible” (reverse polarity). The same answer format was used for the items for efforts toward student learning. For their instructional characteristics, teachers rated if they promoted the understanding of all students (e.g., “I feel responsible that all students understand the pictures and their relationship to the texts.”). The internal consistency of the instructional behavior scales was acceptable to good for the biology and geography sample (see Table 2) and very good for the German teacher sample (except explicit picture discussion with α = 0.63).

Analysis

To answer Research question 1, we performed multilevel analyses in HLM 7 (Raudenbush et al. 2011). In these analyses, we predicted the text-picture integration competence of students at the second time of measurement by the verbal cognitive ability and text-picture integration competence 1 year before (at the first time of measurement). At the class level, we included the quantity of discussing TPRM, the explicit discussion of the picture, and teachers’ efforts toward students’ understanding (see Table 4 for science and Table 5 for German language). As further independent variables, we controlled the school form by means of two dummy variables comparing the Hauptschule and the Realschule, respectively, with the highest school track Gymnasium. To enable an easier understanding, especially of the WLE scores for text-picture integration, we consequently standardized all student and teacher variables except school form before including them into the multilevel model. To understand how much the explained variance increased by adding the variables on level 1 and the school form to the model, we included these variables stepwise. Presented is only the last model, however (model 1 in Tables 4 and 5).

To answer the second research question, concerning a possible Matthew effect, we specified three additional models including interaction effects between either instructional quantity, explicit discussion of the picture, or engagement with students’ characteristics at the first time of measurement, namely cognitive ability and text-picture integration skills (see models 2, 3, and 4). In these models, we included only one teacher variable at the same time, unlike in model 1 to enable a clear interpretation of the outcomes. To analyze the effect size of the variables included into the model, we calculated the difference between the variance on the second level in the intercept-only model (without explaining variables) and in the comparison model and divided the difference by the variance in the intercept-only model (see Hox 2002, p. 64ff).

Results

In Table 2, we show the unstandardized descriptive statistics of our measures. As expected, science teachers discussed texts with pictures more frequently (rather often to pretty often versus rather seldom to rather often) than German language teachers (t = − 4.32, p < 0.001). Both groups tended to discuss pictures explicitly and to engage in fostering all students’ understanding. The verbal cognitive ability in both student groups was similar with 48.7 and 48.0 standard t values, which was close to the mean of 50 (SD = 10). On average, the students’ text-picture integration skills increased by about the same amount in both groups within 1 year (0.35 WLE points in the science group and 0.37 WLE points in the German language group). This corresponded to about one third (29 and 30%) standard deviation at measurement time 1. The correlations between verbal cognitive ability and text-picture integration skills at time 1 (r = 0.56) and 2 (r = 0.53) were high and positive both for the science and for the German language student group (see Table 3). The same was true for the text-picture integration skills at time 1 and time 2 (stability), with r = 0.79 (science) and r = 0.77 (German language). The correlations between the instructional behavior variables in Table 3 show that the frequency of discussing texts with pictures was independent of the teaching quality variables in the sample of science teachers, but the two quality variables were only marginally significantly correlated with r = 0.35 (p < 0.10). In the sample of German language teachers, the two quality variables were negatively correlated (r = − 0.52), showing that the more they tended to discuss pictures, the less efforts they took to make sure all students understood the picture. However, the quantity of using texts with pictures in class and explicit picture discussion showed a marginal positive correlation (r = 0.33, p < 0.10).

Table 3 Summary of intercorrelations as a function of school subject (biology and geography teachers and students versus German language teachers and students)

In the following, we present the results of the multilevel models that were specified to answer the two research questions. Table 4 includes the findings for science teachers and Table 5 contains the results for German language arts teachers.

Table 4 Multilevel models: testing the effect of teachers’ instructional behavior for biology and geography teachers (standardized scores)
Table 5 Multilevel models: testing the effect of teachers’ instructional behavior for German language teachers (standardized scores)

Does teachers’ instructional behavior relate to the development of students’ text-picture integration skills?

First, we considered the results for the students and their science teachers (see Table 4). In the science student sample, 50% of the variance in text-picture integration skills at time 2 was located at the class level, meaning it related to common variables for students of one class, such as their teachers or school types. Controlling for text-picture integration skills at time 1, 80% was explained, and 20% of variation at the class level remained unexplained by prior text-picture integration skills which could be explained by teacher behavior.

The results of model 1 show a significant positive effect of quantity and explicit picture discussion on the development of text-picture integration skills on the individual level, after also controlling for verbal cognitive ability and school form. This means that students’ text-picture integration skills increased more if teachers and students talked more frequently about texts that also contain pictures in class and if the teacher explicitly discussed the picture. Altogether, the teacher instruction variables on level 2 explained 8.5% of the variance in the text-picture integration skills at time 2 that remains after controlling for text-picture integration skills 1 year before (8.5% of the change of skills).

In the German language sample (see Table 5), the outcomes were similar: About 52% of the variance in text-picture integration skills at the second measurement time was located at the class level, and of this, about 86% was explained by the text-picture integration skills 1 year before, with 14% of variance remaining unexplained which could be explained by teacher behavior. However, after controlling for cognitive ability and school form, we did not find a significant effect of any of the teacher quantity and quality variables, thus corroborating Hypothesis H1b, assuming larger effects of teacher instruction for science than for German language teachers.

Are there interactions with verbal cognitive ability and prior text-picture integration skills giving evidence for a Matthew effect?

In the multilevel models 2 to 4, we analyzed the second research question, assuming a Matthew effect for the teaching quantity and quality, that is, a larger effect for students with higher text-picture integration skills and a higher verbal cognitive ability at measurement time 1.

In the science teacher sample, the existence of a Matthew effect was supported for the explicit discussion of the picture, as the significant interaction effect with text-picture reading skills at time 1 shows (see Table 4, model 4, TePi1 × pict. discuss.). This means that students with higher text-picture integration skills at the first measurement time increased their skills within 1 year more than students with a lower starting value, if teachers did not avoid discussing the picture. For teaching quantity or efforts toward student learning, text-picture integration skills increased independently of prior skills and prior verbal cognitive ability.

In the sample of German language teachers, no Matthew effect appeared for any instructional behavior variable (see Table 5). Thus, for the second hypothesis, there was evidence only for one of three teaching variables in relationship to the text-picture integration skills at time 1, and only in the science teacher sample.

Summary and discussion

In the present study, we analyzed if teachers’ self-reported practices in text-picture integration instruction were positively related to the development of students’ text-picture integration skills in secondary schools. The results showed that, as expected in Hypothesis H1a, students’ ability to understand texts with instructional pictures improved more within 1 year if teachers used TPRM in class more frequently and if they explicitly discussed the picture instead of avoiding this. No effect, however, was found for the self-reported efforts of teachers to guarantee the learning of all students. Yet, these effects occurred only for biology and geography instruction; no relationship between the reported instructional behaviors was found for German language instruction, thus supporting Hypothesis H1b. Furthermore, we evaluated if all students benefited to the same degree from the teachers’ instruction, or if higher-skilled students according to verbal cognitive ability and text-picture integration skills could benefit more from the teachers’ help. The results suggested that students did not profit differently from the analyzed instructional behavior with one exception. A small Matthew effect appeared indicating that students with higher text-picture integration skills at the first measurement point improved their skills a bit more if teachers discussed the picture explicitly instead of avoiding this. This outcome points toward the meaning of prior knowledge and skills for further learning and to the problematic that those learners who need their teachers’ support most may have difficulties benefiting from it. Again, there was no effect for German teachers, and no increase of explained variance was measurable, showing that there is no general effect of teacher instruction and no effect for specific subgroups of students.

To our knowledge, this is the first time that a study did not focus on the optimal design of TPRM or on the learning progress caused by this design. Instead, we focused on what secondary school teachers in regular school instruction can do to contribute to their students’ understanding of the material and if their instructional behavior can indeed help to improve students’ text-picture integration skills in practice. The outcomes supported the assumption that at least science teachers in secondary school using texts with instructional pictures more often than language teachers may have an impact on their students’ text-picture integration skills. That the impact was related not only to the frequency of using TPRM but also to the explicit discussion of the picture shows that teachers matter and that they are doing something right. At the same time, this demonstrates that, if teachers use TPRM in class, even the basic quality aspect to discuss the picture instead of focusing just on the text is not self-evident and is not mastered by every teacher.

One reason that no effect was found for German language teaching may have been that these used TPRM less often than science teachers, as proven again in this study. Additionally, these teachers may have known less about how to support students’ text-picture reading skills, as shown by the lower scores for explicit picture discussion and by the negative correlation between explicit discussion and efforts toward student understanding.

The outcome for the science teachers is consistent with the large impact of regular and conscious practicing on learning gains that was found in the meta-analysis of Hattie (2009). The effect size of 8.5% of variance at the class level being explained by means of analyzed instructional variables after text-picture integration skills at the first measurement time have been controlled corresponds to a medium effect size according to Cohen (1988). This is comparable to the general effect of teaching reported by Hattie (2009), being also of medium size according to Cohen (1988) with d = 0.47. Application as a measure of learning and practicing opportunities (“practice a skill or procedure,” “apply a formula,” or “transfer knowledge”, p. 146) according to Kyriakides et al. (2013) shows a small effect on learning with r = 0.18 and about 3% of explained variance.

The measured effect size of self-reported teacher behaviors for students’ text-picture integration skills may have been a realistic estimation of the effect of teachers’ instruction on students’ skill development. First, this seemed realistic as variables at the individual level (verbal cognitive ability and school form) explained variance at the class level for the most part, leaving few unexplained variances. Second, we analyzed the impact of only one German language and one science teacher. The latter taught a subject (biology or geography) that had probably the greatest effect on students’ improvement in text-picture reading. Still, analyzing more teachers’ instructional behavior would likely increase the effect size. Third, teachers do not systematically and successfully support their students’ ability unless they are graded, meaning they are not methodically considered in educating teachers. In these cases, research can be done to determine how teachers can improve support of text-picture integration.

On the other hand, it may be possible to account for the variance explained through teacher behavior by differentiating between further forms of instructional quality when pictures are discussed explicitly. However, although “doing more” is recommended, it is not always the better instructional behavior (see Leopold et al. 2015). Thus, it may not be clear to the teachers themselves what behaviors are good instructional methods. This may be one reason why teachers reported investing great efforts toward student understanding; however, if teachers do not know how to teach, only high-skilled learners can benefit.

Limitations

The quasi-experimental design of this study had limitations, as it did not allow for clear causal conclusions, due to the possibility of confounding variables other than cognitive ability, prior text-picture integration skills, and school type which we did not control. However, because we used longitudinal data and the teachers’ behaviors were measured before the dependent variable, it was not very probable that the development of learners’ text-picture reading skills may have influenced the teachers’ estimation of their instructional behaviors.

A second limitation was the small teacher sample. To corroborate our outcomes, the study should be repeated with larger samples for German and science teachers, allowing for comparing teacher effects on mutual student samples. Third, we used teachers’ self-reports, because for observations, only a short time period can be taken into account, probably too little to record the frequency of using TPRM in class and the tendency to discuss these pictures explicitly (see also Clausen 2002; Praetorius et al. 2012). Furthermore, teachers’ decisions concerning their instructional process may not be visible for students, so that teachers may be the best source of information in this case. We did not manipulate teachers’ behavior in an experiment but asked them about their actually implemented behavior, because we wanted to learn about their impact on students’ development in an ecological valid educational environment. Finally, Kunter et al. (2008) found that students’ and teachers’ estimations of instructional behavior can be comparable, speaking for the use of teacher self-reports.

Conclusions and future prospects

The present study is a good starting point for more intense studies about the effect of current teaching practices on the development of students’ text-picture integration skills. Further studies should first, use larger teacher samples; second, combine teachers’ self-reports with students’ reports or observer ratings to avoid self-serving bias; and third, analyze in detail how teachers discuss instructional pictures with their students and which strategies are most promising. A good example is the work by Ohle and McElvany (2016), using video analyses to learn more about teaching quality in elementary school, based on the three quality characteristics cognitive activation, structuring the learning process, and motivation quality. Further teaching strategies that might be considered and have been discussed earlier are for example the discussion of common misconceptions or graphic conventions (e.g., Leinhardt et al. 1990). Peeck (1993) differentiates between five forms of instructional interventions with increasing effects on learning from text illustrations, that may also be helpful for further research concerning effects on a positive development of text-picture integration skills. For example, telling students to pay attention toward labels or color in general or in a specific picture to promote surface structure mapping (see Schnotz and Bannert 2003) may be a promising strategy. According to Peeck (1993), however, having students actively engage with TPRM—or even create their own—may be more effective instructional behavior for picture processing. According to Coleman et al. (2011), constructing activities are rather seldom in primary school class, so it would be interesting to learn if they cannot only improve the understanding of illustrations compared to pure diagram interpretation (Philipp 2008; Van Meter 2001; Van Meter and Garner 2005), but if they can also help to improve the text-picture integration skills of students, and how much support by the teacher is needed to make it a successful learning strategy.

Further research is also required to understand the reasons behind why teachers avoid discussions of pictures. Possible reasons are a lack of science knowledge and understanding of teachers (Weiss et al. 2001) or bad experiences with the discussion of pictures because of a mental overload especially of low-ability students (Oerke et al. 2018). The study of students’ gain in text-picture integration skills may further profit from analyzing long-term effects over two or more years. If we know more about the current teaching practices and their impact on the improvement of students’ text-picture integration skills, rules can be derived on how to improve current teacher education and further education concerning the support of understanding texts with instructional pictures.

It may be necessary to differentiate to a higher degree between different forms of instructional behaviors to explain the remaining amount of variance at the class level.

For example, do teachers just offer frequent opportunities to learn with TPRM, do they directly teach learners how to improve their ability of text-picture reading, or, in the best case scenario, do they explicitly tell learners which strategies are successful in text-picture integration and why are they successful? Do teachers just tell learners to pay attention to the picture, do they tell them to interpret it, or do they, as recommended by Peeck (1993), have students do something with the illustration that helps them to extract significant information, like labeling or comparing parts of the picture, using the picture to answer questions, etc.?

Proceeding analyses of teaching quality concerning text-picture integration, observing naturalistic ways of teaching—however only for elementary school—show that teachers emphasize information selection and reciprocal use of text and picture as instructional strategies (Ohle and McElvany 2016). It seems expedient to use their approach also in secondary school and examine the effects of the observed teaching characteristics on student learning. In a second step, the causality of the teacher behaviors on students’ text-picture reading development should be corroborated in an experimental study.