Introduction

When clinicians enter practice, they encounter an inevitable combination of problems: ones that they have studied in school, and ones they have not seen before. It follows, therefore, that alongside exposing trainees to known, routine problems, education should ideally also prepare trainees to generate solutions to new and expected problems in practice (Mylopoulos et al. 2016). This capacity to learn new information, to use resources effectively and innovatively, and to invent new strategies for learning while solving problems is defined in the literature as ‘Preparation for Future Learning (PFL)’ and is understood to be essential for adaptive expertise (Mylopoulos and Woods 2017). In practice, experts who have not developed PFL will still be able to solve routine problems, but will likely underperform in situations of ambiguity, novelty, or complexity (Mylopoulos and Scardamalia 2008). Unfortunately, many practicing clinicians incorrectly apply known solutions to new problems (Mamede et al. 2010; Saposnik et al. 2016). This is unsurprising since current health professions education frequently focuses on instruction that optimizes replication of previous performance or application of known solutions rather than assessing the ability to generate new solutions or learn in the future (Mylopoulos et al. 2016). Given this discrepancy between professional expectations and what is currently taught, it is imperative for educators to design innovative systems of instruction and assessment that develop PFL (Mylopoulos et al. 2016).

Recent research in human cognition and learning suggests that educators can design learning experiences that allow students to efficiently master a core selection of knowledge and skills, while also supporting development of PFL (Bransford and Schwartz 1999; Kapur 2014; Mylopoulos and Woods 2014). Approaches to education that support development of PFL aim to provide trainees with cycles of instruction and assessment that offer opportunities for both acquiring and applying knowledge as well as using existing knowledge to learn new concepts or solve novel problems (Woods and Mylopoulos 2015). For example, studies of classroom instruction have found that, given a combination of discovery learning experiences (e.g., contrasting cases, invention activities) and subsequent instruction, students demonstrate improved learning of related yet new material when next placed in knowledge-rich environments (Bransford and Schwartz 1999; Mylopoulos et al. 2016). These types of instructional interventions have been termed ‘guided discovery’ (Bruner 1960). In one study, students who explored and charted data sets on memory experiments were better able to predict the results of a novel experiment, but only if they heard a lecture on the topic afterwards (Bransford and Schwartz 1999). Struggling through the problem first prepares learners to grasp the significance of the expert solution (Schwartz and Bransford 1998). Notably, the design of these interventions are different than hybrid problem-based learning, flipped classroom, or case-based learning designs because students are placed in a knowledge-rich environment with a new problem only after they have struggled, been shown an expert solution, and learned the concept. Assessment then focuses on whether the student is able to take action to acquire the knowledge they need to solve a new problem.

Critically, instruction that encourages guided discovery may initially appear unproductive because student performance on standard assessments, which emphasize replication and application of knowledge, does not improve (Schwartz and Martin 2004). Students engaged in guided discovery may generate atypical or incorrect solutions and, not surprisingly, several studies have demonstrated that students’ performance on standard assessments is poor (Kapur and Rummel 2012; Schwartz et al. 2009). Therefore, assessing the efficacy of guided discovery instructional design strategies requires that testing go beyond measuring acquisition and application (because there may be no impact), and also assess students on their ability to learn in the future (Woods and Mylopoulos 2015).

‘Productive Failure’ is an example of a guided discovery instructional design strategy that maximizes future learning by first engaging students in problem solving, and then teaching the central concept and procedures associated with the problem (Bransford and Schwartz 1999; Kapur 2014). Learners typically fail to generate the established solution in the initial problem-solving stage, but during the instruction phase the learners can (a) think about what they were doing, (b) recognize the limitations of their generated solutions, and (c) consolidate their knowledge into the established solution (Kapur 2016). Researchers hypothesize that generating solutions may activate and differentiate relevant prior knowledge during the problem-solving phase, and may help learners notice the limits of their prior knowledge (Kapur 2014; Kapur and Rummel 2012). For example, early work on productive failure showed that engaging students in complex, ill-structured problems requiring them to apply concepts in Newtonian kinematics without providing support structures could be productive (Kapur 2008). In a later study, students who received productive failure instruction—as opposed to direct instruction—outperformed students on conceptual understanding and transfer of learning around the concept of variance (Kapur 2014). Based on work to date, Kapur suggests several key design features for the benefits of productive failure to be appreciated: (a) the initial problem-solving task should be challenging enough to engage the learner in the exploration, but not so challenging that the learner gives up, (b) the problem-solving task must admit multiple solutions, strategies, and representations, that is, afford sufficient problem and solution spaces for exploration, (c) problem solving must activate learner’s prior knowledge—formal as well as intuitive—to solve the problem, and (d) an expert should build upon the student-generated solutions by comparing and contrasting them with the correct solution (Kapur 2016).

The success of the work of Kapur and others suggests that productive failure might be a promising strategy for enhancing clinical education for health professions. However, productive failure has largely been studied in elementary education, where the problems and concepts are relatively simple. To date, productive failure has not been explored in the context of health professions education, where students are expected to draw on multiple disciplines (e.g., math, biology, physiology, chemistry) and apply these principles to complex problems.

To begin to explore using productive failure in health professions education, the objectives of the current study were (a) to compare the effectiveness of productive failure relative to direct instruction on acquisition and application of a novel concept, and (b) to compare the effectiveness of productive failure relative to direct instruction on an assessment which tests students’ preparation for future learning. We hypothesized that on the acquisition and application tests there would be no performance difference between participants who learn using direct instruction or productive failure materials. However, we expected that the participants in the productive failure condition would outperform those in the direct instruction condition on the PFL assessment.

Methods

Participants

This study took place during the autumn of 2017 at the Leslie Dan Faculty of Pharmacy at the University of Toronto (U of T) (Canada). The participants were year-one students enrolled in the Doctor of Pharmacy (PharmD) program. The researchers chose the year-one PharmD population deliberately to ensure that participants would have minimal prior exposure to the new concepts, but still be able to understand the basic terminology presented in the materials. This was intentional as the study was designed to go beyond the knowledge, skills, and abilities of the learners to test learning of a novel concept.

The researchers contacted 235 year-one pharmacy students through the class LISTSERV and in-class announcements. Forty-three (18%) of the contacted students volunteered to participate. After enrollment, the participants were randomly assigned to a learning condition: either direct instruction (22 participants) or productive failure (21 participants). Data for three participants (one from the direct instruction condition and two from the productive failure condition) were excluded from the analysis as the participants did not complete the entire experiment.

An honorarium of thirty Canadian dollars in the form of a gift card was offered to students who participated in the study. The Research Ethics Board at the U of T granted human research ethics approval (Protocol Reference #34479).

Material development

Learning and assessment materials

Both the direct instruction and productive failure learning materials were developed by the authors. When deciding what content area to teach and assess in this study, guidance was taken from the design principles of Kapur’s productive failure studies (Kapur 2016). Understanding creatinine clearance is a challenging problem, but not so challenging that participants are not able to engage. There are multiple solutions and approaches that are accepted by experts. Participants have prior knowledge in maths, biology, physiology, and chemistry, which could be activated to help them solve the problem, but is insufficient for the appropriate application of the concept.

The Cockcroft–Gault equation, \(CrCl = \frac{{\left( {140 - age} \right)\, \times \,weight \,\left( {\text{kg}} \right) \,\times \,1.23}}{{serum\,creatinine\, \left( {\upmu{\text{mol}}/{\text{L}}} \right)}}\), traditionally has been used to estimate creatinine clearance (CrCl) when determining medication doses (Cockcroft and Gault 1976). The problem of creatinine clearance is an important problem to study because novices (whether pharmacists, physicians, or nurses) tend to use the Cockcroft–Gault equation quite differently than experts. In a hospital setting, pharmacists are frequently consulted regarding renal dosing of medication. Novices tend to plug-and-play: collect the data for the variables in the equation (age, weight, sex, serum creatinine), enter the values into the equation, calculate a number, and then use that number to choose the right dose from a dosing table or medication dosing resource. However, this output can be misleading if novices do not understand the limitations and applicability of the equation. These nuances made the problem an interesting one to study using the productive failure methodology.

The materials were pilot tested with four novices (health professions education students) and two experts (8 + years of clinical experience) and were revised as necessary based on pilot testing. The participants learned independently using paper-based learning and assessment materials. The assessment questions were in a multiple-choice format.

Design

Learning phase

All phases of the study occurred within a single experimental session lasting approximately 2 h. All participants first completed a 40-min learning phase in which they were asked to study the same concepts of estimating creatinine clearance based on kidney function (Fig. 1). The materials in this phase were different depending on which learning condition the participant was randomly assigned to during enrollment. The participants in the direct instruction learning condition were told about the problem of estimating creatinine clearance based on serum concentrations of creatinine, and then they were given the Cockcroft–Gault equation, \(CrCl = \frac{{\left( {140 - age} \right) \,\times \,weight \,\left( {\text{kg}} \right)\, \times \,1.23}}{{serum\,creatinine \,\left( {\upmu{\text{mol}}/{\text{L}}} \right)}}\) (Cockcroft and Gault 1976), as a potential solution to this problem. The participants in the productive failure learning condition were told about the same problem of estimating creatinine clearance, but instead of being given the Cockcroft–Gault equation, they were given raw data from the original Cockcroft and Gault study, and asked to invent a formula that would best approximate creatinine clearance. The participants were supplied with graph paper to assist them in evaluating the association between some of the variables (e.g., age and creatinine concentration in the urine). The learners were instructed to make as many attempts as they could for at least 15 min (or until the learning phase was complete) and all materials were collected at the end of this phase.

Fig. 1
figure 1

Study design

Instruction and practice phase

All participants completed the same 10-min instruction and practice phase which assessed content that was common to both learning conditions. The participants were given the Cockcroft–Gault formula and asked to answer ten multiple-choice questions that assessed recall of content from the learning phase (Cockcroft and Gault 1976). The participants were also given the opportunity to practice calculating creatinine clearance using the Cockcroft–Gault formula (3 multiple-choice questions). For the participants in the productive failure condition, this was the first time that they had seen the Cockcroft–Gault formula.

Assessment phase

All participants completed a 40-min series of sixteen multiple-choice questions designed to assess knowledge acquisition, knowledge application, and PFL. An example of acquisition is the knowledge the learners exercise in performing the task of determining creatinine clearance when given the appropriate variables. An example of application, is direct use of the equation in a case-based example. The PFL assessment items contained new content written in the stem of each multiple-choice question (example: antibiotic dosing in a patient with acute kidney injury) that students would have to learn in order to successfully answer the question (see Table 1 for sample questions). At the end of the experiment, the participants were asked whether they had heard of the Cockcroft–Gault equation before, and, if yes, whether they had used it.

Table 1 Sample assessment questions

Analysis

The data collected were the scores of the recall quiz (administered in the instruction and practice phase), and the scores from the assessment phase—acquisition, application, and PFL. For each participant, the proportion of correct responses on the recall quiz, acquisition, application, and PFL assessment were calculated as outcome measures (1 point for each correct response). In addition, the data from the recall quiz were submitted to an independent-samples t test, comparing the direct instruction and productive failure conditions. This analysis was secondary, intended only to ensure that the both groups were able to comprehend the content and had a basic understanding of the formula.

The acquisition and application assessments were analysed separately using the Mann–Whitney U. Based on the literature, we did not expect a significant difference in either the acquisition or the application assessment. The PFL assessment was analysed using an analysis of covariance (ANCOVA) with the learning condition (direct instruction vs. productive failure) as the between subject variable and practice test performance as the covariate. Differences between the groups were determined to be statistically significant if p < .05.

Results

Correct responses on each assessment were recorded for each participant. Means and standard deviations of these proportions are provided in Table 2.

Table 2 Mean proportion of correct results in the instruction and practice phase and PFL (preparation for future learning) phases

Instruction and practice phase

On the recall quiz, which took place during the instruction and practice phase, the participants in the direct instruction group outperformed those in the productive failure group. The mean score on the recall quiz was .87 for learners in the direct instruction condition and .81 for learners in the productive failure condition. Although this was a secondary outcome, an independent-samples t-test was conducted to compare the two learning conditions on a practice assessment and was found to be significant, t(1, 38) = 2.24, p = .031.

Assessment phase

Acquisition: The mean score on the acquisition phase was 1 for learners in the direct instruction condition and .98 for learners in the productive failure condition (range = .67–1). Due to the non-normal distribution, an independent samples Mann–Whitney U was conducted to compare the two learning conditions and we found no significant effect of learning condition, p = .810.

Application: The mean score on the application phase was .91 for learners in the direct instruction condition and .98 for learners in the productive failure condition. (range = .67–1). An independent-samples t-test was conducted to compare the two learning conditions and the difference was not significant, t(1,38) = 1.9, p = .06. Due to the non-normal distribution, an independent samples Mann–Whitney U was conducted to compare the two learning conditions and we found no significant effect of learning condition, p = .247.

Preparation for future learning: Participants in the productive failure condition outperformed those in the direct instruction condition on the PFL assessment. Participants in the direct instruction condition obtained a mean score of .67 and the productive failure condition a mean score of .75 (range = .3–.9). A one-way ANCOVA was conducted to compare the two learning conditions on the PFL assessment. Performance in the instruction and practice phase was used as a covariate (p = .137). A significant effect of learning condition was found, F(1, 38) = 6.53, p = .04. The effect size of the difference is in the moderate range: Cohen’s d = 0.69.

Discussion

This study investigated the effectiveness of productive failure as an instructional approach relative to direct instruction in health professions education. The specific objectives were to compare the effectiveness of productive failure with direct instruction on knowledge acquisition, knowledge application, and PFL assessments. Participants in the productive failure condition significantly outperformed those in the direct instruction condition on the PFL assessment without compromising performance on the knowledge acquisition and knowledge application assessments.

Interestingly, participants in the direct instruction condition outperformed participants in the productive failure learning condition on the recall quiz during the instruction and practice phase. If the productive failure materials were simply superior materials, then the participants in the productive failure condition would have performed the same (or better) than those in the direct instruction condition on the recall quiz during the instruction and practice phase. However, the resulting crossover suggests that this was not the case. During the learning phase, none of the participants in the productive failure learning condition were able to generate a formula that accurately estimated creatinine clearance. So, their superior performance on the PFL assessment could be due to prior knowledge activation and differentiation during the learning phase, which may have created the opportunity to develop conceptual understanding in the instruction and practice phase. Critically, these gains in conceptual understanding occurred despite productive failure participants failing to generate an established solution during the learning phase.

Our results support Kapur’s hypothesis that placing a well-designed problem-solving phase prior to instruction may help learners to better notice and attend to the critical features of the concept (DeCaro and Rittle-Johnson 2012; Kapur 2014; Schwartz et al. 2011). It has been suggested that when learners struggle to discover solutions, they begin to (a) understand the problems that led to the expert theory or model, and (b) comprehend the limitations of the established solution once it is presented to them (Bjork and Bjork 2014; Schwartz et al. 2009). The struggle and invention that occurs in the problem-solving phase may be crucial if the goal is to develop deeper conceptual understanding of the constraints and limitations of estimating creatinine clearance using the Cockcroft–Gault formula. The instruction and practice phase consolidated their knowledge into the established solution and prepared learners for future problem solving in a knowledge-rich environment (Schwartz and Bransford 1998).

The similar patterns of performance on the acquisition and application assessments is consistent with previous studies that used traditional assessment and found no difference in outcomes when comparing guided discovery activities with direct instruction. Structured generation activities followed by instruction (such as productive failure) can appear unproductive if standard assessments, which emphasize replication and application, are used (Schwartz and Martin 2004). Our results emphasize the importance of aligning instruction and assessment methods (Mylopoulos and Woods 2014). In this study, the PFL measure allowed us to emphasize the preparation of learners to use their knowledge for later learning, as opposed to demanding expert-like performance during learning (Woods and Mylopoulos 2015).

There are a few aspects of the study that may limit transferability to clinical learning outside of the laboratory. First, the researchers tightly controlled the learning experience and the time on task was artificially limited. In real world use of these strategies, direct instruction would likely take considerably less time. Consequently, the participants in this study may have experienced boredom, which could have confounded post-test performance (Leppink 2017). Second, the authors developed the learning and assessment materials and tested them with a small group of novices and experts solely for use in this experimental setting. Thus, the psychometric properties of the materials generated are not available and they could not be reliably exported into a real word classroom.

The results of this study reveal several implications for educators. First, teaching complex topics earlier in the curriculum using methods like productive failure may support three types of performance—acquisition, application, and PFL. Second, productive failure—along with other instructional approaches, such as instruction that integrates basic science mechanisms with clinical manifestations or contrasting cases—appears to foster the acquisition of knowledge that supports new learning (Woods and Mylopoulos 2015). Given that health professionals must regularly develop new knowledge to ensure that they adapt and advance their practice, these models of teaching and assessment could be a valuable addition to any curriculum (Mylopoulos et al. 2018).

The results of this study emphasize the value of struggle during learning and support the theory that problem solving prior to instruction may be more effective than direct instruction when preparing novices to learn new knowledge in a related domain. Teaching strategies that maximize performance in the short term may not necessarily be the ones that maximize learning in the longer term (Kapur and Bielaczyc 2012; Schmidt and Bjork 1992). This study supports the idea that engaging students in solving problems that are beyond their abilities can be a productive exercise in failure.