Introduction

In Germany, the predominant form of instruction in mathematics at university level usually involves a professor (in lectures) or a teaching assistant (in tutorials), who presents a correct solution path for a complex problem to a class of students (Beutelspacher and Danckwerts 2005). This very structured, direct form of instruction does not provide students with the opportunity to solve problems themselves and to engage with the learning content in a self-determined way. Such opportunities would help students to become aware of their own misconceptions and comprehension gaps. Without the awareness of comprehension gaps, students are not likely to pose questions or actively participate in class. If students do not actively participate, they may, however, fail to construct deepened knowledge of concepts and procedures which would be needed for subsequent flexible application of problem-solving heuristics (Beutelspacher and Danckwerts 2005). This, in turn, is the prerequisite for successful problem-solving in mathematics at the university level, because problems at this level tend to be complex. The unfortunate result in the university context is a high failure rate in the end-of-term examinations in mathematics.

One promising idea to improve student learning might be to trigger awareness of comprehension gaps and to promote active knowledge construction earlier in the learning process by delaying instructional support. This strategy is followed by an approach known as Productive Failure (e.g. Kapur 2008, 2009). In this approach, “failure” refers to students grappling with the learning content and generating solution attempts in a self-determined fashion. More specifically, in Productive Failure, students first collaboratively solve problems without receiving instructional support. During this phase, students have been found to generate a variety of different solution approaches that usually differ from the canonical one. In a subsequent instruction phase, the self-generated solution approaches are compared and contrasted in a class discussion before the teacher (or in a university context a teaching assistant) finally presents the canonical solution (for more details on the design principles of Productive Failure see Kapur and Bielaczyc 2012).

The potential benefits of delaying instruction have been shown in multiple studies (e.g. Roll et al. 2009; Schwartz and Bransford 1998; Schwartz and Martin 2004). It appears that problem-solving prior to instruction prepares learners for the subsequent instruction (Schwartz and Bransford 1998). Moreover, the work on impasse-driven learning (VanLehn et al. 2003) also suggests that struggling during problem-solving (which is what is likely to occur when instruction is delayed) can pave the way for learning from the subsequent explanation. Most research on delaying instruction (including the studies described above) has been conducted at the high school level and in situations where students learn completely new content. Delaying instruction in a learning situation with completely new content requires learners to use their informal prior knowledge and to generate new ideas during problem-solving. University students, however, often have to relearn complex content about which they have previously learned in order to deepen their knowledge for final examinations. In this context, prior knowledge refers to already learned, but probably not yet fully understood and memorized, content. This difference might affect the processes resulting from a delay of instructional support: During collaborative problem-solving, university students are likely to generate canonical but incomplete solutions that are based on previously learned content, instead of inventing completely new solution approaches based on their informal ideas, as is done by high school students. It remains to be tested empirically whether delaying instruction in a situation where students already have some formal knowledge also prepares students for subsequent instruction. In the current study, we test this question in a university relearning setting in the domain of mathematics.

During the unsupported problem-solving phase in Productive Failure (Kapur 2009) and similar approaches (e.g. Schwartz and Martin 2004), students work in small groups. Generally, research has shown that collaborative learning can improve learners’ understanding by fostering active elaboration and awareness of comprehension gaps (Renkl 2008a; Slavin 1996). However, findings also suggest that collaboration does not automatically lead to beneficial interactions. For instance, learners tend to superficially build consensus and contribute to the problem-solving process unequally (Salomon and Globerson 1989). As a consequence, they may not benefit from collaborative learning opportunities. The literature on collaborative learning thus calls into question whether it is really advisable to give no support during the initial collaborative problem-solving phase. Based on the literature on scripted collaboration (e.g. O’Donnell 1999), we argue that it might be a promising idea to implement a collaboration script to support students’ interaction while content-related instruction is being delayed.

Typically, collaboration scripts structure the interaction in small groups of learners by prescribing a sequence of interaction phases and assigning roles. To fulfill their role, the learning partners have to engage in specific activities; thereby, the script prompts particular cognitive, metacognitive, and social processes (King 2007). Collaboration scripts thus promote fruitful interaction (Slavin 1996) and have been demonstrated to improve student learning (e.g. O’Donnell 1999). One well-known collaboration script is the MURDER script by O’Donnell and Dansereau, which was developed to help students when collaboratively learning from difficult texts (Hythecker et al. 1988; O’Donnell 1999). The script divides the reading activity into six steps corresponding to the acronym MURDER (mood, understand, repeat, detect, elaborate, review; Hythecker et al. 1988). Across these six steps, learners alternate the roles of recaller and listener. Students in the role of recaller are requested to verbalize their understanding. In this way, a cognitive process is triggered: By making their understanding explicit, learners are expected to elaborate on their knowledge, which has been found to benefit learning (O’Donnell 1999; Teasley 1995). In turn, students in the listener role focus on monitoring their learning partner’s utterances and at the same time reflect on their own understanding. Kneser and Plötzner (2001) were able to show that metacognitive processes such as monitoring and reflecting increase the benefits of collaborative learning. As students switch roles, both learning partners can benefit from the different types of learning mechanisms. Studies have demonstrated that collaborating with the MURDER script increases the learning outcome compared to unscripted collaboration (e.g. O’Donnell 1999). A similar script was developed by Berg (1993) for the domain of mathematics. In her script, the learners alternate between the role of explainer or solver and the role of the checker. First, the explainer self-explains a worked example from the textbook, while the checker monitors the explanation and at the same time reflects on his or her own understanding. Afterwards, the checker adopts the role of solver, that is, he or she solves a task similar to the previous example. The learning partner now takes on the role of checker. Finally, the two learners answer summary questions together. Similar to the MURDER script, the distribution of roles in the Berg script is expected to promote cognitive processes, such as elaborating on one’s knowledge, and metacognitive processes, such as questioning and monitoring one’s understanding. In an experiment conducted over the course of several weeks, the implementation of this collaboration script led to better learning results than teacher-centered instruction (Berg 1993). In summary, the literature suggests that the activities and processes prompted by collaboration scripts increase the likelihood that learners will benefit from collaborative learning opportunities.

At first glance, scripting collaboration and delaying instruction (as in Productive Failure) seem to constitute two opposing approaches: collaboration scripts advocate providing structure and support for collaborative learning particularly at the beginning (e.g. King 2007; O’Donnell 1999), whereas the Productive Failure approach emphasizes the benefits of an initial unstructured collaboration phase and delayed instruction (Kapur 2008, 2009). Thus, in the light of the assistance dilemma (e.g. Kapur and Rummel 2009; Koedinger and Aleven 2007) these approaches seem to represent the ends of the continuum between providing and withholding support features. However, upon closer inspection, the two approaches are not really contradictory: Although in the studies on Productive Failure, students’ collaboration is indeed mostly unsupported, the emphasis of the approach is on a delay of content-related instruction (e.g. teaching the correct solution method, Kapur 2010); whether or not to support students’ interaction is not discussed explicitly. Indeed, even Kapur and Bielaczyc (2012) concede that in a Productive Failure setting, some support of students’ interaction seems necessary: They describe the need to motivate students to persist in their problem-solving attempts. Similarly, research on guided discovery learningFootnote 1 has shown that in self-determined learning situations, learners need support for their learning activities and their interaction (e.g. van Joolingen et al. 2005). Taking both lines of research into account, we argue that providing a collaboration script could be a promising way to further improve learning in a delayed instruction setting. We implemented a collaboration script during the initial collaborative problem-solving phase in order to promote fruitful collaboration: cognitive processes such as elaborating and metacognitive processes such as monitoring and questioning.

To sum up, with the current study, we aimed to extend the positive effects of delaying instruction to a university context in which complex content is relearned by students in preparation for their final examinations. Moreover, we implemented a collaboration script in the initial collaborative problem-solving phase in order to promote fruitful interaction and thereby increase the potential benefits of this phase.

Learning method Think Ask Understand (TAU)

We developed a learning method called TAU (Think Ask Understand) for a university relearning setting in the domain of mathematics. TAU is characterized by a delay of instruction, but at the same time provides a collaboration script to support students’ interaction during a problem-solving phase at the beginning. In analogy to Productive Failure, each tutorial session with TAU starts with a collaborative problem-solving phase, followed by an instruction phase led by a teaching assistant. In the collaborative problem-solving phase, students solve complex mathematics problems (i.e. problems that require the combination of several concepts and problem-solving heuristics) in pairs. Unlike in existing Productive Failure studies, the problems target contents about which students have already learned during their studies. Nevertheless, as professors and teaching assistants report, even in this relearning situation, students will most likely struggle with the problems. The collaborative problem-solving is expected to help students activate prior knowledge and realize their comprehension gaps, and to promote knowledge construction as students generate solution approaches. During this phase of TAU, students do not yet receive content-related instruction. In contrast to Productive Failure, however, students are supported by a role script, which promotes fruitful interaction as described in the “Introduction”: The role script instructs students to take turns in adopting the roles of questioner and thinker. The thinker is prompted to verbalize his or her thinking process during problem-solving. The role of the thinker is therefore expected to trigger cognitive processes such as elaboration. The questioner is asked to listen carefully to the thinker’s verbalization and to pose questions if he or she does not understand the thinker’s explanation or if the thinker is applying an incorrect solution strategy. The role of the questioner will thereby increase the likelihood of metacognitive processes such as questioning the partner’s problem-solving attempts and monitoring one’s own understanding. As described in the “Introduction”, these cognitive and metacognitive processes are held accountable for the benefits of collaborative learning. It should be emphasized once more that the role script only supports students’ interaction, without providing content-related instruction. As in Productive Failure, the instruction is delayed until a subsequent phase. In our setting, this phase is led by a teaching assistant. The instruction phase comprises a class discussion about open questions and about difficulties which students encountered during their collaborative problem-solving. The targeted concepts and correct solution strategies are also taught in this phase. In the class discussion, students are encouraged to pose questions which they have encountered during their collaborative problem-solving and have not yet been able to answer in the discussion with their learning partner. These questions enable the teaching assistant to adapt his or her instruction so as to build on students’ prior knowledge and target their comprehension gaps.

Method

Experimental design

We conducted a study comparing learning with TAU (TAU condition) to a direct instruction condition (DI condition) in which content-related instruction was given right at the beginning, which is the common form of instruction in German mathematics education at the university level. The study took place over the course of 4 weeks, with one tutorial session/week (i.e. four tutorial sessions altogether). We thus implemented a two-factorial design with a learning condition factor (TAU vs. DI condition) and a repeated measures factor (the four tutorial sessions). Learning outcomes were assessed by post-tests after each session. We hypothesized that students in the TAU condition would outperform students in the DI condition on the post-tests, because students in the TAU condition would actively solve and grapple with problems and thus likely activate their prior knowledge and realize comprehension gaps. Due to alternating the roles of thinker and questioner during problem-solving, as instructed by the role script within TAU, students were expected to engage in particular learning-relevant cognitive and metacognitive processes such as elaborating and monitoring, as described above. We hypothesized that the activities during the initial collaborative problem-solving phase within TAU would enable students to benefit from the subsequent instruction and thus learn more than their direct instruction counterparts.

Inspired by the in vivo research paradigm advocated by the Pittsburgh Science of Learning Center (PSLC 2011), we tested our hypotheses in an in vivo learning experiment (e.g. Koedinger et al. 2009). Thus, our experiment was conducted in the field with real learners, real learning contents, and over a realistic duration of time, which promotes the external validity of the study. At the same time, we tried to achieve a maximum of experimental control in order to improve the internal validity in spite of conducting the study in vivo: Participants were randomly assigned to the two conditions and all tutorials were instructed by the same teaching assistant of the Institute of Mathematics.

Experimental conditions and procedures

The study took place as an optional tutorial for third-term mathematics students at the Institute of Mathematics of a large German university. It consisted of four tutorial sessions over the duration of 4 consecutive weeks. In each session, a different topic from the area of mathematical analysis was addressed. Each session lasted for 1 h and 45 min, including a post-test. The learning time was held constant across conditions.

Students in the TAU condition alternated between collaborative problem-solving phases and instruction phases led by the teaching assistant. The cycle was repeated twice in each session, with two different mathematical problems, both on one topic. In each cycle, students initially collaborated on a problem in pairs. During this collaboration phase, their interaction was supported by the role script, which prompted students to alternate the roles of thinker and questioner. Meanwhile, content-related instruction was delayed until the subsequent instruction phase. Participants worked in the same dyads throughout the study. A small number of dyads were regrouped due to absences of one partner. In descriptive terms,Footnote 2 the post-test performance of students in constant dyads did not differ from those of students in dyads with new partner. Therefore, the latter dyads were also included in the analyses.

In the DI condition, at the beginning of each session, the teaching assistant presented and explained a correct solution for the same two problems that students in the TAU condition attempted to solve. The implementation of the DI condition followed Slavin’s (2006) standards of good direct instruction: The teaching assistant stated the learning objectives, encouraged the students to pose questions throughout the session, presented the concepts and problem solutions with concrete examples, and asked relevant questions to probe students’ understanding. However, problem-solving was limited to short classroom questions and to the post-tests. In contrast to the PF condition, students in the DI condition did not solve any problems collaboratively.

In both conditions, a post-test was administered after each of the four tutorial sessions.

Participants

Seventy-six third-term mathematics students of a large German university voluntarily participated in the study. The study was advertised as a tutorial in preparation for the intermediate diploma examination for which students could sign up. It was discernible from the results of the written end-of-term examination of the preceding term, which students indicated in a questionnaire at the beginning of the first session, that students of all competence levels participated in the study. Students were randomly assigned to the two learning conditions. The resulting groups did not differ in terms of prior knowledge as measured by their scores in the written end-of-term examination of the preceding term shown in Table 1.

Table 1 Descriptive statistics of the final sample

Only the data of students who participated in all four tutorial sessions were included in the analyses. As described above, this included students in stable dyads and students with regrouped partners. Thus, the final sample size consisted of fifty-nine students (34 students in the TAU condition and 25 students in the DI condition; see Table 1). The drop-out did not appear to be systematic, as the drop-out number was equal in both conditions (8 students in the TAU condition and 9 students in the DI condition), and in descriptive termsFootnote 3, students who attended all sessions did not differ from students who missed sessions.

Learning materials

The learning material of our study (i.e. the tutorial) comprised four topics from the area of mathematical analysis that students had judged as particularly difficult in a discussion at the end of the preceding term: line integral, implicit function theorem, inverse function theorem, and the concept of submanifolds. These topics had already been covered as part of a lecture in the preceding term and were therefore a repetition for students in preparation for their upcoming intermediate diploma examination. As mentioned earlier, problems on these topics are typically complex, that is, they require flexible application of multiple problem-solving heuristics by combining several mathematical theorems. Students have to activate and elaborate on their prior knowledge so as to generate ideas for how to combine the theorems in order to solve a given problem. At the same time, students have to monitor their solving process in order to consider limitations of the theorems and problem-solving heuristics. The learning materials and tests were developed on the basis of past lecture notes. The study materials were therefore representative of the actual content of the standard tutorials. We pilot-tested all materials with two students from the same population as in the main study. Based on these pilots, we adjusted the degree of difficulty of the materials and tests, and the required completion time.

Dependent variables

After each tutorial session, a post-test of the corresponding topic was administered. The four post-tests each comprised one problem. Like the problems solved during the learning phase, post-test problems required a combination of several mathematical theorems. Moreover, students had to take into account their limitations. For example, in the first post-test, the students had to combine the path independence theorem and gradient assumptions in order to answer the question of why a given line integral does not contradict the path independence theorem. The two theorems had been covered in the first tutorial (topic: line integral); however, during the tutorial session, the theorems had only been discussed in isolation. Therefore, in order to solve the test problem, near transfer was necessary. The tests of both conditions were anonymized prior to the scoring. The teaching assistant then scored the tests without knowing the corresponding learning condition, based on standards used for the mathematics tutorials at this university. A maximum of four points could be achieved on each test. Half points were subtracted for calculation errors.

Furthermore, one dyad of the TAU condition was randomly chosen to be audio-recorded during all collaborative problem-solving phases. The audio recordings were analyzed regarding the script fidelity of the dyad’s interaction: We investigated whether the students interacted as prescribed by the role script, that is, whether they clearly adopted the roles of thinker and questioner. The coding distinguished between interaction in accordance with the role script (i.e. structured interaction) and unstructured interaction. Structured interaction was further divided into episodes that started with an explicit distribution of the roles, and episodes with implicit, spontaneous role taking. More details on the analysis are provided in the results section. As the analysis focused on demonstrating the impact of familiarity with the script on students’ interaction, the first and the last session were selected for the analysis.

Results

If not otherwise specified, the 59 participants with complete datasets were included in the analyses and tested against an alpha level of α = 0.05.

Individuals as units of analysis

When investigating collaborative learning results, learning outcomes can be assessed on an individual level or on a group level. As individual knowledge acquisition is emphasized in many learning settings, including university education, we conducted the post-tests in our study on an individual level. However, as the collaboration partners mutually affect each other’s actions during learning, it could be possible that the learning results of the collaboration partners are related to each other. This potential interdependence of the learning results of the collaboration partners must be considered in the statistical analysis, as one of the assumptions of the ANOVA model is the independence of the data (Kenny et al. 1998; Kenny et al. 2006). Kenny et al. (1998) established two criteria to assess the independence of dyadic data by intraclass correlation analyses (ICC) (for a study in the learning sciences using these criteria see: Mullins et al. 2011). First, a significant ICC means nonindependence. Since the test power of ICC is low, Kenny and colleagues recommended testing against a level of significance of α = 0.20. Second, they introduced the concept of consequential nonindependence: If the ICC in dyadic analysis is higher than r = 0.45, results of the ANOVA are biased; however, if the ICC is below r = 0.45, the results of the ANOVA are rather robust even though the ICC might be significant. According to this recommendation, we computed ICCs of the post-tests. Although the ICCs were significant for all tests but the second (from p = 0.065 to 0.485), we were able to compute ANOVAs on an individual basis without corrections as the ICCs of all tests were below r = 0.45 (from r = 0.009 to 0.351).

Learning outcomes

The four topics covered by the learning material of our study were not equal in difficulty. In consequence, the degree of difficulty of the four post-tests could not be held constant and thus the means of the post-tests differed. Therefore, z scores were used to compare the performance on the four post-tests.

In an ANCOVA with prior knowledge as covariate, we found a significant interaction between the learning condition factor (TAU vs. DI condition) and the repeated measures factor (the four tutorial sessions) (F[3, 168] = 3.829, p = 0.011, f = 0.261). Table 2 shows the means and standard deviations, and the tests of significance as tested in subsequent separate ANCOVAs for each session. In descriptive terms, students in the TAU condition performed lower than students in the DI condition in week 1, albeit not statistically significantly. In week 2, students in the TAU condition outperformed students in the DI condition, but the difference did not yet reach statistical significance. In weeks 3 and 4, the differences between the conditions reached statistical significance, with students in the TAU condition outperforming students in the DI condition. With effect sizes of f = 0.509 and f = 0.337, the differences can be interpreted as large and medium effects, respectively (Cohen 1988).

Table 2 Results of the post-tests

Figure 1 illustrates the differences between the two learning conditions across the four tutorial sessions. The diagram clearly shows the interaction effect with the reversal of performance between conditions from session 1 to 2.

Fig. 1
figure 1

Results of the post-tests

Process analysis

As described above, we audio-recorded the collaborative problem-solving phases of one dyad of the TAU condition and analyzed their interaction during the first and the fourth session. The analysis of these process data distinguished between interaction in accordance with the role script (i.e. structured interaction) and unstructured activity. Episodes with a clear distribution of roles over 30 s were counted as structured interaction. Shorter phases and communication without clear roles were counted as unstructured interaction. Episodes of structured interaction were further divided into explicitly structured interaction and implicitly structured interaction. Explicitly structured interaction refers to episodes in which students explicitly distributed and clarified their roles as thinker and questioner before turning to problem-solving. In contrast, implicitly structured interaction refers to episodes with a clear distribution of roles, without students explicitly negotiating or acknowledging this role distribution. The process data was independently coded by two raters. The inter-rater agreement was excellent for session 1: Rater 1 coded the interaction of the first session as 81% explicitly structured, 10% implicitly structured, and 9% unstructured. Rater 2 coded the interaction in the first session as 81% explicitly structured, 11% implicitly structured, and 9% unstructured.Footnote 4 For session 4 the inter-rater agreement was not as excellent as for session 1Footnote 5, but still good: Rater 1 coded the interaction in the fourth session as 50% explicitly structured, 46% implicitly structured, and 4% unstructured. Rater 2 coded the interaction in the fourth session as 44% explicitly structured, 51% implicitly structured, and 5% unstructured. Due to the limited data points of only two sessions, the interrater reliability was not calculated statistically. The ratings of rater 1 are shown in Fig. 2.

Fig. 2
figure 2

Time in percent that the students of the analyzed dyad interacted in an unstructured, explicitly structured, and implicitly structured manner

When comparing the interaction of the analyzed dyad during the first session to their interaction during the fourth session, the time that students spent on activities according to the role script increased. In Fig. 2, this is evident in changes in the activity distribution from the left bar (Session 1) to the right bar (Session 4). We can observe an increase of the grey and black areas taken together (both indicating structured activity) and a corresponding decrease of the white area (indicating unstructured activity).

Furthermore, the decrease of the grey and increase of the black area indicate that in the first session, students in this dyad distributed their roles much more explicitly than in the fourth session. We have translated the interaction at the beginning of the first session into English in order to illustrate the explicit role distribution (see Table 3): At the beginning of the interaction the students explicitly decide who adopts which role (1 and 2), but they start with a role distribution contrary to their decision (3). They then realize that they did not understand the role script correctly (4–7), read the role instruction again (8), and continue with a role distribution conform to their decision. At some point student 2, who was the thinker so far, realizes a comprehension gap and explicitly suggests alternating the roles (10).

Table 3 Interaction of a dyad in the first session (translated from German into English)

In contrast, in the fourth session, one student often began with a question (i.e. the role of questioner) or with verbalizing his or her thoughts on the problem (i.e. the role of thinker) and the partner automatically took the complementary role without negotiation about the script. In other words, spontaneous role taking increased from the first to the fourth session. To illustrate this spontaneous role taking and implicitly structured interaction, we have also translated an excerpt of the fourth session into English (see Table 4): Student 1 elaborates on his statement (6 and 8) after student 2 indicated a lack of understanding (5 and 7) (i.e. student 1 adopts the role of the thinker and student 2 adopts the role of the questioner). Later the students switch roles without explicitly stating it: Student 1 poses a question (13 and 15) and student 2 automatically takes over the role of the thinker (14, 16 and 17).

Table 4 Interaction of a dyad in the fourth session (translated from German into English)

Discussion

As described in the introduction, the current study aimed to extend the positive effects that have been found for delaying instruction to a different learning situation. More specifically, we investigated the effects of delaying instruction in a university context in the area of mathematics, and with contents that had to be relearned by students for their examination. We developed a learning method called TAU. Following the Productive Failure approach, TAU consists of two learning phases: an initial collaborative problem-solving phase, which is characterized by a delay of content-related instruction, and a subsequent instruction phase. In contrast to previous Productive Failure studies, we implemented a role script during the initial collaborative problem-solving phase to support students’ interaction. Against the background of the literature on collaboration scripts (e.g. King 2007; O’Donnell 1999), we hoped that this script would increase the chances of students benefitting from the collaborative problem-solving phase by triggering learning-relevant cognitive and metacognitive processes. In an in vivo experiment spanning four tutorial sessions in 4 subsequent weeks, we compared TAU to a DI condition, in which students were provided with instruction right at the beginning of each of the four sessions.

The significant interaction effect for the learning outcome as measured by the post-tests demonstrates that the more students were familiarized with TAU over the course of the 4 weeks, the better their learning outcomes became, as compared to the DI condition. Correspondingly, separate ANCOVAs for each of the four sessions showed that students in the TAU condition significantly outperformed students in the DI conditions on the post-tests administered after the third and the fourth session (and descriptively also after the second session). An interesting—although not significant—result was that students in the TAU condition performed lower than students in the DI condition after the first session. These findings could be interpreted as showing that as students became more familiar with and internalized the role script, this internalization freed students’ mental capacity to focus on the learning content. In other words, TAU did not improve learning until the method itself was learned. We did not implement cognitive load measures and did not measure script learning (with the exception of the process analysis of one dyad described below), and are thus unable to prove this interpretation directly. Other studies, however, provide indirect evidence in support of our hypothesis, as they have also shown that the introduction of new learning strategies can initially lead to worse learning outcomes (e.g. Friedrich 1992; Friedrich and Mandl 1997). Renkl (2008b) proposed the term temporary performance losses for the phenomenon that initial performance losses have to be overcome before a new learning strategy or a new learning method is successful. In line with this argumentation, our results might also provide an explanation for why script-supported collaborative learning has not always been found to improve domain-specific learning (e.g. Kollar et al. 2007). In many studies, the collaboration script was only implemented for a rather short time period and may thus not have unfolded its potential. Studying the impact of learning methods like TAU over a longer period, therefore, seems to be a fruitful research approach for future studies.

More support for the assumption that students in the TAU condition increasingly internalized the role script comes from our analysis of the audio-recorded interaction of one randomly selected dyad. In the analysis, we compared the dyad’s interaction during the first and the fourth session to investigate the way in which students implemented the role script over time. As reported in the “Results” section, the process analysis revealed that in the first session, students usually began their problem-solving by explicitly discussing the role distribution, while they adapted the roles more implicitly in the fourth session (see Fig. 2). This increase of implicit, spontaneous role taking supports the impression we had already gained from the post-test data: Students internalized the role script over time. Moreover, the notion that an increasing internalization of the script freed capacity for the learning content is also supported by the data of the analyzed dyad, as the increase of implicit role taking coincides with an improvement of the post-test results. In the first post-test, the two students scored around the mean (one student scored above and one student scored below the mean: z = 1.003 and z = −0.063), whereas in the fourth post-test, both students scored above the mean (z = 1.044 and 0.610).

Another result from the process analysis was that students increasingly collaborated according to the script, as was indicated by an increase of (explicitly and implicitly) structured interaction with a clear distribution of the roles thinker and questioner. Taking this result together with the improvement of the post-test results of the dyad, we can assert that more script-compliant activity correlated with more learning. As described in the introduction, we assumed that the activities of the thinker (i.e. verbalizing one’s thinking process during problem-solving) would trigger cognitive processes such as elaboration, and the activities of the questioner (i.e. listening to the partner’s verbalization and posing questions) would trigger metacognitive processes such as monitoring one’s own understanding. These activities have been found to support student learning in other studies. Against this background, the co-occurrence of an increase of script-compliant activity and learning which we found for the one analyzed dyad allows us to hypothesize that our script promoted processes that are fruitful for learning, and possibly those which we had aimed to trigger. It is important to note, however, that the current analysis provides only indirect support of this hypothesis, as it focused on a different aspect of the interaction: structured versus unstructured activity. A specifically focused analysis would be needed to investigate the types of student activities triggered by our script more directly. Moreover, we only analyzed the interaction of one dyad, which clearly limits the possibility to draw conclusions from our process analysis. It would be desirable to collect and analyze the process data of a larger number of dyads.

Our findings give rise to yet another interpretation: As described above, in TAU, content-related instruction is delayed (similar to Productive Failure), but students’ interaction is supported in the initial collaborative problem-solving phase. Despite this difference from the usual Productive Failure condition, we found increased learning outcomes for students in the TAU condition as compared to the DI condition. Moreover, in our case analysis of the interaction process, we found that increased script-compliant activity coincided with improved learning outcome. Our results therefore call into question whether all support must be delayed, as they seem to indicate that supporting students’ interaction in a delayed instruction setting could, in fact, be beneficial. Similarly, Roll et al. (2012) found evidence that providing metacognitive support within a delayed instruction setting leads to improvements of the collaborative problem-solving process and the generated solution ideas. In contrast, Kapur (2010) showed that content-related support during the initial problem-solving phase hampered the Productive Failure effect. Hence, the relevant question may not be whether or not to provide any support from the beginning, but rather when to offer which kind of support. We therefore conclude that the discussion of the assistance dilemma should distinguish between different dimensions of support as proposed in the framework by Diziol and Rummel (2010). Particularly relevant for the study presented in this paper as well as for the studies discussed above are the dimensions timing and level of support in the framework. The former dimension concerns the question of when support is given, whereas the latter dimension concerns the processes targeted by the support (e.g. social interaction processes as in our study, metacognitive processes as in Roll et al. 2012, or cognitive processes as in the study by Kapur 2010). Distinguishing more clearly between those two dimensions will make it possible to further scrutinize the mechanism of delaying instruction.

Although our study has yielded very interesting and promising results, we have to acknowledge some limitations. So far, our findings only give indirect support for the hypothesis that interaction support may foster learning in a delayed instruction setting. A comparison of our TAU condition to a delayed instruction condition without interaction support would be necessary to investigate the hypothesized incremental effect of the interaction support. We aim to target this question in a future study. Moreover, we compared TAU to the standard form of instruction in an in vivo experiment. The in vivo methodology enables high external validity (Koedinger et al. 2009), but to some extent at the cost of experimental control. Although we made an effort to improve the experimental control (random assignment to the conditions, same teaching assistant across conditions, blind scoring), the two learning conditions differed in more than just one aspect and therefore our interpretations are limited: In contrast to the DI condition, in the TAU condition the instruction was delayed, students worked collaboratively, and they solved problems by themselves (prior to instruction). In our current studies, we are trying to address these confounds by running controlled experiments which compare conditions that vary only in one aspect at a time. Nevertheless, we aim to maintain high external validity by working with a realistic learning setting; that is, in our studies, the learning content targets concepts which are currently relevant for the learners, for instance because they have just covered them in class and are about to write a test on them. Thus, our participants are “naturally motivated” to engage with the learning contents, not just because they are participating in a study or are receiving monetary compensation. Moreover, in our studies, the learning phase spans a substantial duration of time.

To conclude, our study contributes to the accumulating evidence (e.g. Kapur 2008, 2009; Roll et al. 2009; Schwartz and Bransford 1998; Schwartz and Martin 2004) that initial collaborative problem-solving followed by delayed instruction fosters learning. The results of our study extend previous findings to a relearning situation at the university level, and give rise to the hypothesis that supporting students’ interaction during initial problem-solving could further promote learning.

While the general effectiveness of delaying instruction has been established, we argue that more controlled studies would be desirable in order to uncover the underlying mechanisms that make this approach effective for learning. The relevance of investigating delayed instruction in more detail is also broadly acknowledged in the scientific community, as evidenced, for instance, by this Special Issue. One aspect that is currently under investigation is the process of generating own problem solutions prior to instruction. It has been found that the number of different solution ideas that students generate and evaluate during their initial problem-solving correlates with the learning outcome (Kapur and Bielaczyc 2012; Kapur 2012; Wiedmann et al. 2012). Furthermore, (Roll et al. 2011) found that students learned more when they generated and evaluated their own problem solutions compared to evaluating solutions that were given to them. Another aspect that receives a great deal of attention is the role that prior knowledge plays in the context of delayed instruction. For instance, Wiedmann et al. (2012) showed that the number of solution ideas that students generated during the collaborative problem-solving phase correlates with the learning outcome. The number of solution ideas generated, however, was influenced by the prior knowledge of the group members: Wiedmann et al. found that in groups of three, one member with high prior knowledge was sufficient to increase the number of solution ideas generated and discussed during collaborative problem-solving and, in consequence, all three group members performed better on the post-test compared to students in groups without a member with high prior knowledge. The results of our most recent study (Westermann and Rummel 2012) also show the importance of prior knowledge in the learning process: We found that a crucial mechanism of delaying instruction is the possibility for teachers to build their instruction on students’ misconceptions and ideas, which they have externalized during the initial problem-solving phase. Our findings suggest that what matters is not only the delay of instruction, but perhaps even more whether students’ misconceptions are explicitly taken up by the teacher in the instruction phase (see also Hammann 2003). The importance of adaptively tailoring support to students’ needs and prior knowledge has already been acknowledged in the research on script-supported collaborative learning (Dillenbourg and Jermann 2007; Diziol et al. 2010). Current research ambitions in this area, therefore, target the development of highly adaptive collaboration scripts (e.g. Walker et al. 2011). Based on this line of research and on the results of our most recent study (Westermann and Rummel 2012), we believe that investigating ways to adapt delayed instruction settings to students’ prior knowledge and misconceptions seems a promising direction for future research.