Introduction

The fundamental perspectives of theoretical motor control and learning research have been successful in generating evidence that informs on the performance and learning of health professional clinical skills that require precision at manual techniques (i.e., psychomotor skills). This view offers a robust approach to addressing questions concerned with the objective assessment of technical skills (Li et al. 2013; Rojas et al. 2011), the impact of sensorimotor constraints on surgical performance and learning (Grierson et al. 2013), and the features of a simulation that facilitate skill acquisition and transfer to real clinical situations (Grierson 2014; Norman et al. 2012). Importantly, this research often points to the possibility of enhancing simulation-based skill learning through observational practice. While observation is certainly key to traditional apprenticeship approaches to medical education, technological advancements have created opportunities for students to review and engage actively with video-recordings of skills being performed. This is a new and potentially powerful method of skill development in medical education, and as such research efforts have been initiated that are specifically concerned with determining the optimal organization of observational demonstrations for precision skill development (Domuracki et al. 2015; Grierson et al. 2012).

At the foundation of our current understanding of psychomotor skill performance is the idea of central representations of action—cortical and sub-cortical structures that contain the sensory, motor, and cognitive processing specifications for performing skilled movements (Elliott et al. 2010, 2011). For the most part these representations are developed through physical practice in the relevant context. However, a substantial amount of neurophysiological (Pellegrino et al. 1992; Kohler et al. 2002) and behavioural (Ashford et al. 2006; Hayes et al. 2010, 2013; Larssen et al. 2012; Mattar and Gribble 2005; Ste-Marie et al. 2012) research has yielded evidence that the development of action representations is also facilitated through observation. The idea is that observation involves an implicit, or covert, sub-threshold engagement of the motor system, which enables the observer to encode the key spatial and temporal features of a skill into her/his own representations of action (Jeannerod 2001). Given that the positive impact of observational practice on skill learning is now well-documented, this method must be seen as a viable way to extend technical skill learning that occurs in simulated and clinical education spaces.

While many similarities between physical practice and observational practice can be drawn—for example, both benefit from the provision of less frequent feedback (Badets and Blandin 2004)—there are some very real and apparent differences between the practice approaches. One very clear difference is that observation requires a model demonstration. It is perhaps not surprising that the characteristics of this model can have a significant impact on the degree and nature of the skill learning that occurs (Weir and Leavitt 1990). For instance, research has shown that both experts (Heyes and Foster 2002) and novices (Buchanan and Dean 2010) can be effective models that support observation-based learning. Interestingly, however, recent work demonstrates that a combination of the two types of models is most optimal for learning (Andrieux and Proteau 2013; Rohbanfard and Proteau 2011). In the work of Proteau and colleagues, the influence that the level of proficiency displayed by the observed model has on the observer’s learning was investigated in a segmented relative-timing task, and it was discovered that observing expert and novice performers gave way to the greatest learning. Their rationale was that efficient action representations include information about the correct way to perform movements but also strategic elements that are designed to protect the actions from the temporal and accuracy consequences associated with the most common errors. At the heart of this assumption is the understanding that our internal neuromuscular systems are inherently noisy such that our actions are characterized by unavoidable variability (Schmidt et al. 1979; Meyer et al. 1988; Elliott et al. 2010). As such, motor expertise must involve the development of strategies that protect movements against this variability. With respect to the proficiency of the observed model, the idea is that exposure to both novices and experts provides the observer with both aspects—information about the appropriate movements (i.e., the expert) as well as information about the consequences associated with inappropriate movements (i.e., the novice). In essence, what to do and what not to do.

While we have also shown that mixed-model observation promotes clinical skill learning (Domuracki et al. 2015), the idea brings us to a new and particularly interesting question concerning the optimal organization of the two model types. The contextual interference effect refers to the differences in performance and learning that arise from practicing one task in the context of other tasks (Magill and Hall 1990). Historically, the effect is demonstrated through experiments that pit blocked physical practice conditions against random, or inter-mixed, practice conditions. For example, if you had 2 tasks—A and B—and one group that practiced task A in a block (i.e., AAA AAA) and then B in a block (i.e., BBB BBB), that group would perform A or B better after that block than a group that practiced A and B in a random fashion (i.e., ABA BBA). However, at retention testing, which occurs some time later, the group that practiced randomly will outperform the blocked group (Lee and Magill 1983; Hall and Magill 1995; Schmidt and Bjork 1992; Taylor and Rohrer 2010). There are two prominent theories that aim to explain this effect. The first is the reconstruction hypothesis, which supposes that during a high interference practice schedule, such as a random schedule, the learner must actively reconstruct the representation for each physical attempt. In this view, the interference invokes an intention-to-action translation process, which means learners must regenerate a new movement plan actively on each trial during the acquisition phase, whereas single context practice does not require an intended action reconstruction on each attempt (Lee and Magill 1983, 1985). The second is the elaboration hypothesis, which speculates that the multiple psychomotor processing strategies that underpin the multiple tasks in a high interference condition are retained in working memory in a way that allows for comparison more readily during practice. Through these comparisons, the high interference practice conditions leads to more distinctive and elaborate action representations (Shea and Morgan 1979; Wulf and Shea 2002).

In the current study, we sought to further extend our understanding of the optimal use of observational practice in the education of precision clinical skills by examining whether the type of learning associated with mixed-model observation is also subject to the contextual interference effect. To do so, we engaged three groups of learners in a learning study of a simple simulated endoscopic skill. Each group engaged in identical physical and mixed-model observational practice of this skill, with the only difference being that one group’s observation was presented in blocked fashion (low interference) while the other two group’s observation were presented in interleaved fashions (medium interference; high interference). In this regard, our hypotheses were threefold. First, we expected all three groups would decrease in error as they practiced. Secondly, we expected that the blocked group would perform better than the interleaved groups during immediate post-testing, but that this effect would be reversed after a 24-h retention period. Lastly, we anticipated that the interleaved groups would continue to outperform the blocked group in the transfer test. In conducting this study we aimed to explore the idea that organizing mixed-model observational practice in a contextually-interfered manner can improve learning.

Methods

Participants

Thirty-nine individuals (25 males, 14 females, mean age = 20.88 ± 1.36) were recruited from the McMaster University health sciences education community. The participants were all self-declared right-handed individuals with normal or corrected-to-normal vision. They had no previous training or education in any surgical procedures. All participants provided informed consent in accordance with the guidelines set out by the Hamilton Integrated Research Ethics Board and the Declaration of Helsinki (2013).

Protocol

Participants engaged in combined physical and mixed-model observational learning of a pots-and-beans task in an endoscopic box trainer. The pots-and-beans task is a component of the Fundamental of Laparoscopic Surgery program, which is a joint educational offering of the American College of Surgeons and the Society of American Gastrointestinal and Endoscopic Surgeons. This task involves picking up a small bean with the right hand grasper, passing it to the left hand grasper and then releasing the bean into a circular pot (Grierson et al. 2013). All of this occurred within the confines of an endoscopic box trainer (Johnson & Johnson Private Ltd), which simulates a minimal access surgical environment.

The participants were allocated randomly to one of the three experimental groups. The groups were defined by the level of contextual interference exhibited during the acquisition phase (Fig. 1). The blocked group (n = 13) observed 4 sets of 10 demonstrations. The first 2 sets were of an expert model and the following 2 sets were of a novice model; or vice versa such that order of presentation was counterbalanced within the group. The semi-interleaved group (n = 13) also observed 4 sets of 10 demonstrations; however, in each set this group observed five blocked attempts of one model, followed by five blocked attempts of the other model. The interleaved group (n = 13) also observed 4 sets of 10 demonstrations but, for this group, the model switched between expert and novice on each observation. The order of presentation was also counterbalanced within the semi-interleaved and interleaved groups (Fig. 1). The order in which each expert demonstration and novice demonstration was presented was the same across all three groups.

Fig. 1
figure 1

Observational practice schedule for each experimental group

The protocol occurred across 4 phases over 2 days: acquisition and post-test on Day 1, and retention and transfer testing on Day 2. The acquisition phase involved sets of observational practice (4 blocks; 10 trials/block) interspersed periodically with sets of physical practice (3 blocks; 6 trials/block). One trial consisted of attempting to move one bean (or observing an attempt), regardless if the attempt was successful or not. The post-test consisted of two blocks of physical attempts that followed the acquisition phase immediately. These blocks varied from the physical practice blocks in that they required participants to perform five successful bean transfers before the block was terminated. The retention test was identical to the post-test. The transfer test also mimicked the post-test in all respects, except the learners were required to move the beans in the reverse direction; from left hand to right hand. For each of the post, retention, and transfer phases, participants rested for 3-min to reduce the impacts of fatigue (Table 1).

Table 1 Schematic of the protocol implemented within the study

Model demonstrations

We filmed two confederates performing the task in order to generate appropriate videos for the expert and novice model demonstrations. Our expert was an orthopaedic surgeon from McMaster University and the novice was selected from the cohort of recruited participants. The videos were recorded directly from the box trainer camera, such that the resulting footage displayed to the participants exactly what was seen on the monitor by the confederates during performance. Twenty error-free attempts were selected from the expert’s demonstrations. We selected 20 attempts of the novice that included both unsuccessful (n = 11) and successful (n = 9) bean transfers. These were ordered such that successful attempts were more prevalent in the later acquisition blocks in order to simulate a learning model. While successful attempts were included in the novice demonstrations, these were all deemed as less efficient and more awkward than those of the expert via consensus agreement from 2 minimal access surgery educators. Furthermore, the novice’s average time-to-complete a trial (28.9 ± 2.1 s) was considerably slower than that of the expert model (16.7 ± 0.8 s).

Dependant measures and analyses

There were three dependant measures: total time, total errors, and errors-per-second. Total time was the amount of time required to successfully transfer a total of five beans. Total error was the sum of the pick-up errors, transfer errors, and the number of errors in dropping the bean into the cup (Grierson et al. 2013). Errors-per-second were calculated to provide an indication of any speed-accuracy trade-off (Fitts 1954). The mean across blocks for each dependent measure was used in all analyses involving the post-, retention, and transfer-test measures.

In order to establish that all three groups were able to acquire and learn the skill during the acquisition phase, total errors were analysed in a 3 Group (blocked, semi-interleaved, interleaved) by 3 Physical Practice Set (PP1, PP2, PP3) repeated measures analysis of variance (ANOVA). To examine the possible presence of a contextual interference effect, independent 3 Group (blocked, semi-interleaved, interleaved) by 2 Test (post-test, retention test) repeated measures ANOVAs were conducted for each of the dependent measures. Additionally, independent 3 Group (blocked, semi-interleaved, interleaved) by 2 Test (post-test, transfer test) repeated measures ANOVAs were conducted for each of the dependent measures in order to explore the impact of contextual interference on the transfer of learning acquired via mixed-model observation. Significant effects (p < 0.05) involving more than two means were decomposed using Tukey’s HSD post hoc methodology.

Correlation coefficients between total errors and total time derived for each of the post-test, retention test, and transfer test are presented to facilitate potential future meta-analyses. These were determined via Pearson’s methodology. When interpreting these relationships, it is important to remember that total error reflects the duration of the entire block of trials such that a significant positive relationship is expected (i.e., blocks with more errors take more time). This is in contrast to the significant negative relationship that characterizes the speed-accuracy trade-off at the level of the individual trial (i.e., quicker movements are more likely to result in error).

Results

Acquisition phase analyses

The 3 Group by 3 Physical Practice Set ANOVA of total errors performed during the physical practice blocks indicated a main effect of Physical Practice Set, F(2, 72) = 6.841; p = 0.002. Post-hoc analysis showed that all the groups performed significantly fewer errors in PP2(4.46 ± 0.22) and PP3(4.31 ± 0.22) as compared to PP1(5.18 ± 0.17). There were no significant between-group difference or higher order interactions. This finding indicates that all participants improved as they engaged in physical and mixed-model observational practice.

Correlations between errors and total time

The correlation analyses yielded significant positive relationships between errors and total for each of the post-test (r = 0.79, p < 0.00001), retention test (r = 0.85, p < 0.00001) and transfer test (r = 0.73, p < 0.00001) trial blocks.

Post-to-retention test analyses

The 3 Group by 2 Test ANOVA of total errors indicated a main effect of Test, F(1, 36) = 4.293; p = 0.046, which described that all groups performed significantly less error at the retention test (PT = 11.62 ± 1.0 errors; RT = 9.08 ± 1.2 errors). The same analysis conducted on the total time dependant measure also indicated a main effect of test F(1, 36) = 20.71; p < 0.001 (PT = 330.2 ± 19.5 s; RT = 230.2 ± 16.3 s). The ANOVA analysis of the errors-per-second dependant measure revealed no significant differences (grand mean = 0.03 ± 0.002 errors/s).

Post-to-transfer test analysis

The 3 Group by 2 Test ANOVA of total error indicated a Group by Test interaction, F(2, 36) = 4.976; p = 0.012. Post-hoc analysis of this effect indicated that the interleaved group and semi-interleaved group both erred significantly less than the blocked group during transfer testing. The interleaved and semi-interleaved groups did not differ from another (Fig. 2).

Fig. 2
figure 2

Mean (SE) total error for the 3 Groups at post-test (PT) and transfer test (TT)

The 3 Group by 2 Test ANOVA of total time also indicated a Group by Test interaction, F(2, 36) = 4.562; p = 0.017). Post-hoc analysis of this effect revealed that the blocked group performed significantly faster than the semi-interleaved group at post-test, but was not significantly faster than either of the higher interference groups in Transfer. The semi-interleaved group and interleaved group did not differ significantly during either test (Fig. 3).

Fig. 3
figure 3

Mean (SE) total time (s) for the 3 Groups at post-test (PT) and transfer test (TT)

Lastly, the 3 Group by 2 Test ANOVA of errors-per-second also indicated a Group by Test interaction, F(2, 36) = 5.64; p = 0.007). Post-hoc analysis of this interaction revealed no significant differences between the groups at post-test. However, at the transfer test, the blocked group performed significantly more errors-per-second than both the semi-interleaved and interleaved groups, which did not differ from each other (Fig. 4).

Fig. 4
figure 4

Mean errors-per-second (SE) for the 3 Groups at post-test (PT) and transfer test (TT)

Discussion

In this study, participants’ observation of expert and novice model demonstrations were manipulated in order to explore the degree to which the learning that results from observation of clinical precision skills is subject to the contextual interference effect. Specifically, our first hypothesis was that physical practice combined with observational practice would promote learning such that all participants would improve over the acquisition period regardless of group assignment. Examination of the total number of errors performed by each group throughout acquisition reveal this to be the case. Our second hypothesis was that the low interference group—the blocked group—would perform better than the two higher interference groups during immediate post-testing, but that this effect would be reversed after a 24-hour retention period. In this regard, the comparisons of post-test and retention test measures did not reveal this reversal of performance proficiency over time. However, the analyses of total time and total errors revealed a general improvement in performance from post-test to retention, which suggests that the learning intervention led to sustained skill refinement for all participants over the retention period. Our last hypothesis was that the interleaved groups would also outperform the blocked group in the transfer test, despite possibly performing less proficiently than the blocked group at post-test. In this regard, the analyses of both total time and errors-per-second revealed that the higher interference practice led to learning that was more generalizable to the transfer task.

Taken together, this data provides some indication that mixed-model observational practice facilitates a greater degree of precision skill learning when presented under a high interference schedule. According to the reconstruction hypothesis (Lee and Magill 1983, 1985), the additional interference requires learners to “reconstruct” portions of each action representation between physical attempts of the skill. Purportedly, this additional reconstruction makes practice more difficult, but engages intention-to-action processes in a way that is critical for future attempts at the skill. Interestingly, the current study’s results, in the context of observational practice, challenge this perspective. In particular, during observational practice, the learner does not engage in any physical practice, which means there is never an intention-to-action process that is disrupted by the observational interference. Rather, in manifesting a contextual interference effect in a mixed-model observational learning study, our findings may be construed as offering some support to the elaboration hypothesis, which describes that learners benefit from high interference practice schedules because it allows them to compare performance strategies more readily (Wulf and Shea 2002). However, under either interpretation, the idea is similar: higher interference during practice necessitates additional processing that depresses immediate performance but elevates future performance.

Yet, there is much about our findings that limit our ability to disentangle fully the argument between the two perspectives. For instance, no differences were revealed between groups at the time of retention testing. Furthermore, where there were differences between the blocked and interleaved groups at transfer testing, there were no intermediary differences between the semi-interleaved and interleaved groups. While, it is not unusual for contextual interference effects to be elicited to a greater degree at transfer test (Bortolli et al. 1992) and/or for tasks of higher complexity (Shea and Morgan 1979), as was the case in the present study, it would be an overstatement to say that these findings are wholly consistent with the effect. Whether this is an effect of the observational nature of this learning task (cf. Wright et al. 1997), the amount of time between post-test and retention test, or the degree of interference introduced by the semi-interleaving manipulation, warrants further investigation.

Another possible reason for the inconsistency of our results may be tied to our non-traditional application of the contextual interference principle. That is the effect is classically understood as resulting from practicing various skills within the same context. In the current study, however, both the observed experts and the novices were attempting to perform the same skill: an endoscopic pots-and-beans task. In this regard, the observation of both models was considered relevant to the construction of the same action representation such that the amount of interference introduced through the interleaved observation protocol may be mitigated. However, the models’ differential experience with the task means that they would employ different strategies in its performance (Guadagnoli et al. 2012; Fitts and Posner 1967). In order to achieve a high level of accuracy in a short period of time, the expert relies on established representations, applies planned corrections at essential moments in the movement trajectory, and in doing so minimizes the overall movement variability. In contrast, as a novice searches for solutions to the challenge, his/her attempts exhibit a high degree of variability. This provides information about how certain actions are tied to success or failure and is critical to the development of representations that are optimized to avoid the consequences associated with error (Elliott et al. 2011). In this regard, it is our position that what the experts and novices are doing are distinctly different, despite the equivalence of the goal. We acknowledge, however, that there is room to better understand this position. Indeed, investigations into the dynamic nature of movement variability profiles (for both models and learners) and the strategic transitions that occur over the period of intermediate expertise are potentially very interesting.

Nevertheless, the present result suggest that there is benefit to introducing interference into mixed-model observational practice of clinical technical skills (Domuracki et al. 2015; Grierson et al. 2012). As such, the organization and schedule of demonstrations should be considered when teaching clinical skills via video-based methods, over online networks, or in traditional clinical education environments. The interleaving methods presented in this experiment are certainly too regimented to be feasibly applied in most skills training environments, where time and resources are often very limited; however, the results do suggest that instructors should consider more acutely how they integrate expert demonstrations with dyadic practice activities during training sessions. Specifically, it is purported that novice learners will benefit the most from practice and observation that switches between peer-to-peer and instructor-led activities on a regular basis. Furthermore, these findings highlight the value that may result from having learners actively engaged as demonstrators in group learning situations. In doing so, the trainee not only benefits from observing the instructor’s expertise but is also afforded the opportunity to develop relationships between approaches to skill performance and the outcomes of imperfect attempts.