Since the rapid implementation of minimally invasive procedures at the end of the last century, the paradigm for training laparoscopic procedures to surgical residents has moved from the operating room to dedicated skills labs for training purposes [1]. A pressing research question is how to design training in the most efficient manner possible, while ensuring excellent skill acquisition, long-term retention, and transfer to the occupational setting. Recent studies in the field of cognitive and educational psychology indicate that substantial improvements in learning efficiency can be achieved by an appropriate selection of feedback, proficiency targets [2, 3], and video tutorials.

At least as important as the selection of material for laparoscopy training is an optimized dosage of delivering the training. Retention of training effects and transfer from trained to non-trained domains depend on factors such as deliberate practice, part-task training, task variability, and overlearning after reaching proficiency [46]. Most important for the current study, it has been well documented [7] that distributing practice over time (spacing) leads to superior learning for knowledge acquisition, as well as motor skill acquisition.

In medical centers and hospitals, staff training is conventionally scheduled on a full day course because this is most convenient for organizing purposes. However, planners and curriculum designers need to ponder whether or not the benefits in terms of convenient logistics are worth the potential sacrifice in terms of the quality of learning. If the rate of skill acquisition suffers and long-term retention is compromised during training on a tightly crammed schedule, it may be wise to consider alternative planning methods for training medical staff.

Observations regarding the benefits of spacing practice are very robust in memory tasks, but are also prevalent for motor learning [5, 8]. In a meta-analysis on verbal memory tasks [9], it was demonstrated that the lag (time interval) in between training sessions should increase as a function of the retention interval, with an optimal lag at 15–20 % of the time until the final test [10]. Even though in several domains of motor skills the spacing effect is also reliably and consistently demonstrated [11, 12], estimations of its magnitude varies with the training context [13] and task complexity. That is why the current study will show the value of spacing in laparoscopic training for basic and advanced tasks.

The spacing effect has recently been researched in the setting of surgical training courses. Moulton et al. [14] demonstrated significantly better retention following training on a microvascular anastomosis course for a group that received four training sessions in subsequent weeks (spaced) as compared to all on the same day (massed). In a different study, the spacing effect was tested while teaching laparoscopic cutting using the Minimally Invasive Surgical Training Virtual Reality (MIST-VR) [15]. Performance was better if three training sessions were scheduled on consecutive days as compared to all on a single day.

Most spacing studies use small time intervals [13] (minutes, hours, or days, instead of weeks, months, or years) out of logistical convenience [9] for the same reasons that trainers usually opt for massed training; it’s just more practical. However, using small spacing windows provides little empirical basis for real educational settings where short-term and long-term retention are more important. The current study aims to incorporate skill retention and to differentiate among different levels of task complexity.

In the current study we aimed to replicate the spacing effect in a physical box-trainer model using an array of different laparoscopic training tasks varying in difficulty, using a weekly time interval for the spaced training group and adding short- and long-term retention of 2 weeks and a year, respectively. We hypothesize that the spaced group will have superior performance both at the end of training and at the retention sessions.

Methods

Participants

Forty-one medical students (25 female) without prior experience in laparoscopy training were enrolled in the study. Age ranged from 17 to 28 (mean = 20) and all participants were right-handed. Participants received a certificate upon completion of the training as a compensation for taking part in the study.

Apparatus

Participants received training on a laparoscopic box-trainer including four basic and one advanced task, all with previously established construct validity [16]. All of these tasks aim to train perceptual and motor skills such as depth perception, adapting to the fulcrum effect, and instrument handling, all of which are essential to proficiency in laparoscopic surgery. The first task requires participants to stretch a rubber band around a set of 12 spikes. In this task, a trainee learns to work with forces. In the second task, participants string a pipe cleaner through a set of four rings. This task aims to train bi-manual dexterity. The third task involves the placements of small beads on a pegboard and requires very astute precision of motor actions. In the fourth task, a circle is cut in a rubber glove, which trains participants in exposure and dissection skills. In the advanced task, participants trained the skill of intra-corporeal suturing. Participants were taught how to create three knots, starting with the needle in their right instrument, using two throws for the first knot. One throw was used for the second knot starting in the left instrument and one throw for the third knot starting from the right instrument (See Electronic Supplementary Material). During training, an open model of suturing was utilized to prepare participants in suturing before practicing in the laparoscopic box-trainer.

Performance of participants on the box-trainers was recorded on a connected PC by means of video splitter and grabster (Terratec Grabster AV 400 MX) to convert video output to USB. The USB signal was converted to separate .mpg files by VLC Media Player for Windows.

Participants filled out self-report questionnaires covering demographics (gender, age, etc.), prior sport, music, and gaming experience (0 = no experience, 1 = I used to play, 2 = yearly, 3 = monthly, 4 = weekly, 5 = daily), goal orientation [17], and growth mindset [18].

Training programs

Training was given to 21 participants on a spaced schedule, and 20 participants on a massed schedule. Participants were randomly assigned to the two groups. All participants spent 1 h on a set of psychological tasks (testing cognitive flexibility and spatial skills) prior to laparoscopy training. These cognitive tasks were hypothesized to predict skill acquisition on laparoscopic skills, but are beyond the scope of the present article. Both groups received laparoscopy training for a total of 225 min. This total training time was divided into three blocks of 75 min, which consisted of 15 min of instructions and 60 min of hands-on practice. In the first block, participants trained on the four basic laparoscopic tasks. During the second and third block, participants trained on all five laparoscopic tasks and an open suturing model to learn the basics of suturing prior to intra-corporeal suturing. During each block, participants completed each task twice in a fixed order (rubber band, pipe cleaner, beads, circle, suturing), after which participants were allowed to spend any remaining time on any training task of choice.

For the massed practice group, the three blocks of training were scheduled consecutively on 1 day. For the spaced practice group, these three blocks were separated by 1 week. After 2 weeks, a short-term retention session was scheduled to assess the participants’ skill without any prior practice during that session. A long-term retention session was planned 12–14 months after training. Participants did not train their laparoscopy skills outside the allocated training time.

Performance was video-recorded at the end of the first and third block of training and at the start of both retention sessions, totaling four moments of measurement for the first four tasks and three for intra-corporeal suturing.

Participants received standardized instructions by the trainer and self-directed feedback [19] in order to minimize confounding effects on the learning curve of the trainees.

Outcome measures

The video files of the participants were assessed by the first author for completion times of the task, as well as accuracy. An accuracy scoring tool based on principles of metrics by Gallagher and O’Sullivan [20] was created for each task: frequently occurring steps and errors were scored and summed to form an accuracy measure for each of the laparoscopic tasks. Lower scores on completion times and accuracy (lower number of steps and errors made to complete a task) reflect better performance.

Statistical analysis

Data were checked for normality and statistical tests were chosen accordingly. We tested whether groups were comparable at baseline in terms of age, gender, hand preference, musical, gaming, sports activity, and personality factors.

For each of the five tasks (both for completion times and accuracy), for all moments of measurement (training session I, training session III, short- and long-term retention session) Mann–Whitney tests were performed to check for differences between groups at each stage of training. Wilcoxon signed-rank tests were performed as well to verify the improvements within trainees between training sessions. Furthermore, non-parametric correlations were used to explore potential relationships between questionnaire variables and performance on the laparoscopic tasks.

Results

Three of forty-one participants did not fully complete the training and were excluded from analysis. 38 participants (N Massed = 18, N Spaced = 20) took part in the short-term retention session and 12 (N Massed = 5, N Spaced = 7) participants completed the long-term retention session.

If at certain points during measurement participants were unable to complete the task within a reasonable amount of time (maximum of 10 min), a score of 601 (a score that would automatically be assigned as the highest rank in the non-parametric tests) was assigned in order to avoid selective drop-out from our sample based on poor performance. This was the case for 12 out of 592 moments of measurement. Also, 28 out of 592 video files were lost due to trouble with the video recording equipment.

Baseline check

Chi-square tests showed no significant differences between the two groups in terms of gender and hand preference. Mann–Whitney tests indicated no difference for gaming and sports activity, but the spaced group practiced significantly more (Mdn = 2 out of 5) with musical instruments than the massed group (Mdn = 0 out of 5), U = 64, z = −3.549, p < 0.001, r rb = −0.64. Also, the spaced group was significantly younger (Mdn = 18.5 vs Mdn = 21), U = 43, z = −4.066, p < 0.001, r rb = −0.76.

Main analysis: basic laparoscopic tasks

The results for the first four tasks are shown in Figs. 1 and 2. They indicate improvements in performance for all participants and highlight the differences in learning curves between the spaced and massed training groups.

Fig. 1
figure 1

Median completion times (A) and median accuracy scores (B) for the first two basic tasks after the first block of training (N = 38), at the end of training (N = 38) and at short-term (N = 38) and long-term retention (N = 12) for both training groups (NS non-significant; *p < 0.05; **p < 0.01; ***p < 0.001)

Fig. 2
figure 2

Median completion times (Fig. 1A) and median accuracy scores (Fig. 1B) for the third and fourth basic task after the first block of training (N = 38), at the end of training (N = 38) and at short-term (N = 38) and long-term retention (N = 12) for both training groups (NS non-significant; *p < 0.05; **p < 0.01; ***p < 0.001)

At baseline, participants in the two groups did not show significantly different levels of performance on the four basic tasks, with the exception of completion times and accuracy scores on the rubber band task, with scores in favor of the spaced group.

At the end of training, performance levels on each of the first four tasks showed significant effects in favor of the group of the spaced training schedule, with the only exception of accuracy scores on the cutting circle task. Estimates of effect sizes (r rb) of training were 0.67, 0.73, 0.65, and 0.36 for completion times on the elastic band, pipe cleaner, beads, and circle cutting task, respectively. Effect sizes of training for accuracy scores at the end of training for each task were 0.63, 0.57, 0.48, and 0.13, respectively.

At the 2-week post-training retention session, some of the differences in skill level on the first four tasks were still present, while others had vanished (see Figs. 1, 2). Effect sizes for the retention session for completion times on each task were 0.31, 0.42, 0.45, and 0.29, respectively. Accuracy scores effect sizes at retention were 0.16, 0.36, 0.31, and 0.11, respectively.

At one-year retention, the effects on the pipe cleaner task and the accuracy scores for the rubber band and pipe cleaner task persisted. Effect sizes were 0.07, 0.73, 0.17, 0.2 for completion times and 0.73, 1, 0.4, 0.5 for accuracy.

The Wilcoxon signed-rank tests revealed the degree of within-person improvement between sessions and showed a general pattern of improvement of trainees’ performance from the first to the third training session for the first four tasks (see Figs. 1, 2).

Main analysis: intra-corporeal suturing

The Wilcoxon signed-rank tests for completion times and accuracy of intra-corporeal suturing showed no significant progress between the measurement at the end of training and short-term retention, as well as long-term retention (see Tables 1, 2).

Table 1 Mann–Whitney and Wilcoxon signed-rank tests to compare median completion times (in seconds) for the advanced task (intra-corporeal suturing)
Table 2 Mann–Whitney and Wilcoxon signed-rank tests to compare median accuracy scores for the advanced task (intra-corporeal suturing)

Mann–Whitney tests revealed substantial effects of group on completion times and accuracy scores on the intra-corporeal suturing task, both at the end of training and at the two retention sessions. Estimates of effect sizes (r rb) for completion times at the end of training, short- and long-term retention were 0.58, 0.53, and 0.77, respectively. Effect sizes for accuracy scores were 0.55, 0.51, and 1, respectively. These results of intra-corporeal suturing at the end of training and the retention sessions for both groups are illustrated in Fig. 3.

Fig. 3
figure 3

Median completion times (Fig. 1A) and median accuracy scores (Fig. 1B) for the advanced task after the first block of training (N = 38), at the end of training (N = 38) and at short-term (N = 38) and long-term retention (N = 12) for both training groups (NS non-significant; *p < 0.05; **p < 0.01; ***p < 0.001)

Extended analysis and confound check

Completion times on the different laparoscopic tasks correlated moderately with each other, with non-parametric Spearman’s rho varying between 0.055 (ns) and 0.733 (p < 0.01) for completion times and from r = −0.208 (ns) to r = 0.764 (p < 0.01) for accuracy. Correlations among completion times and their corresponding accuracy measures were very high, varying between r = 0.582 (p < 0.01) and r = 0.929 (p < 0.01), which indicates that accuracy on any given moment of measurement is highly related to completion times on that particular instance of performing a laparoscopic task and that participants did not trade accuracy for speed.

Further analysis showed no significant relations between gender, gaming activity, goal orientation, growth mindset, and performance on any of the laparoscopic tasks. Significant correlations were found between age and some of the laparoscopic tasks on some of the moments of measurement, varying in magnitude from 0.338 to 0.636. Similarly, some correlations were significant for musical activity of the laparoscopic tasks, ranging from −0.351 to −0.519.

To test the possibility that the factors of age and musical activity confounded the effects of spacing, we did a post hoc case-controlled analysis for both variables. After matching both groups in age by gradually excluding the youngest participants from the spaced group and the oldest participants from the massed group until the groups were comparable in age, we found no major changes relative to the results of our main statistical tests. Matching groups for musical activity also did not substantially alter our results.

Cost-benefit analysis

To assess the success of our training, we compared both groups to a previously established performance benchmark for each task [16] (that is also used to determine whether a trainee has reached proficiency and is qualified to perform minor laparoscopic surgery in the OR). For the elastic band task, 7 out of 18 participants (39 %) in the massed group have reached the proficiency benchmark by the end of training. For the spaced group, this number is 18 out of 20 participants (90 %). For the pipe cleaner task, 2/18 (11 %) versus 14/20 (70 %) participants have reached proficiency. For the beads task, this comparison is 2/18 (11 %) versus 10/20 (50 %). For the cutting circle task, 4/18 (22 %) versus 12/20 (60 %). For intra-corporeal suturing: 4/18 (22 %) versus 11/20 (55 %).

Discussion

The current study replicated and extended previous studies showing that laparoscopic skills can be acquired in less training time by presenting a spaced schedule rather than the more typical massed schedule [15]. After the same time investment, a larger proportion of students met proficiency criteria in the spaced than in the massed condition. Their performance was higher, clearly illustrated in lower completion times and accuracy scores. Moreover, we showed superior short- and long-term retention of the advanced suturing task up to a year after spaced as compared to massed training. Thus, the spaced schedule helps to maintain long-term reliability of skills, which evidently has implications for patient safety and training efficiency.

Overall, the spacing benefits were most pronounced for advanced skills, although benefits were also demonstrated for most of the indices of the basic skills. The relatively strong spacing effect for advanced skills is counter to what other motor skill research [13] suggests, since an earlier meta-analysis showed that the spacing effect usually diminishes with increasing task complexity. This highlights the importance of scientific testing of learning strategies in unique training contexts. This finding clearly illustrates that trainees require less training time on a spaced schedule, which means less resources will be spent on training surgical residents.

The results showed minimal differences in the two groups in terms of demographics and initial performance. Therefore, the groups can be classified as comparable and the differences in performance and learning rates later in training can be attributed to our manipulation in the training set-up.

Overall, the differences in groups on the first four tasks are not as pronounced as for intra-corporeal suturing (see Fig. 3). It could be that participants needed less time to master the more basic tasks, resulting in a less pronounced difference in end levels of performance between the two groups after 225 min of training. Intra-corporeal suturing is a more cognitively demanding task, and it typically takes much more practice to reach proficiency on it. Hence, task difficulty may have a moderating influence on the degree of the spacing effect in laparoscopy training.

It is also interesting to observe that in certain cases, participants in the massed group showed improvements in performance from the end of training to the retention session. For example, in accuracy scores on the beads/pegboard task and completion times for the rubber band and pipe cleaner task. In between these moments of measurement, there was no additional practice. By probing retention, there were 2 weeks of spacing built in for all participants, which may be a plausible explanation for this improvement.

A key question is what processes explain the benefits of spacing over massing training. The spacing effect can be explained in several ways.

Obviously, trainees become mentally fatigued [21] after prolonged training. Fatigue has been found to impair learning of psychomotor and cognitive skills in laparoscopic tasks [22]. Thus, spacing training across multiple sessions can be beneficial by preventing fatigue.

A second explanation can be found in a differential effort investment. Every time a trainee starts a new training session, there is a gap to get performance back up to par (to the proficiency target). This gap is typically smaller on massed training sessions, where the knowledge and practiced skills remain active in working memory throughout the session with little effort investment by the trainee. Trainees on a spaced practice schedule do not have this advantage, as training information needs to be reactivated at the start of each session. This forces the trainee to exert more effort to attain the proficiency goal, which facilitates skill acquisition.

Furthermore, massed schedules lead trainees to overestimate how well they have mastered the skills in the training [23], as it is easier to reproduce the same level of performance after a short time interval (same day) as compared to a longer time interval (a week later). Hence, massed training is beneficial for performance during training in the short-term at the risk of inaccurate appraisal of the trainees’ actual skill level [23]. This incorrect form of self-efficacy poses a threat to the appropriate assessment of proficiency by both the trainee and the trainer. When proficiency is determined directly after a massed training, it gives an inaccurate picture of a trainee’s skill level at the transfer setting (i.e. the first laparoscopic procedure for the trainee taking place several weeks/months from now). It is, therefore, important to assess proficiency both after training and at a retention interval in order to ensure accurate skill assessment.

In training, most learning takes place in between practice, rather than during practice. Memory for motor skills [13] improves due to consolidation, the gradual strengthening of memory that takes place in the elapsing time window that follows practice, to a large extent during sleep [24]. When training of one motor skill is directly followed by training of a second motor skill, learning of the first skill is substantially impaired, a phenomenon known as retrograde interference [25]. The explanation for this is that the new synaptic patterns (acquired during training) in the motor memory regions of the brain did not have any opportunity to process and consolidate and get overwritten by a new motor pattern during training of the second skill. This impairment vanishes when more time (4 h or more) elapses between training of the first and second skill. The positive effects of consolidation accumulate not only in this time window, but also during overnight sleep. Retrograde interference can be partially mitigated by a nap in between training of different motor patterns [26]. This finding highlights the important role sleeps plays in the amount of consolidation that will take place in memory.

In other settings involving motor skill, such as dancing, one of the main advantages of spacing practice is that it reduces overuse injuries and improves recovery after training [27]. This applies to laparoscopic surgical training as well, since many of our participants complained about minor pain in their hands and wrists after practicing the tasks for a prolonged time. This has mostly to do with the fact that they are novices and have a non-optimal posture, but spacing training immediately alleviates this problem.

All of these processes (mental fatigue, investment of effort in learning, accuracy of self-efficacy appraisal, and memory consolidation) influence the advantages in learning that spacing offers, but it is unclear to what degree each of these adds to the effect of spacing. A more elaborate design would be needed to separate, for example, the influence of consolidation during sleep from just recovering from fatigue. In future studies, the time in between training sessions could be varied in order to further optimize training and nuance whether the advantages of spacing are mostly the effect of the draining of cognitive resources after a certain time on training or that consolidation of learning plays a more predominant part. Also, in this sample we initially kept a small short-term retention interval due to logistic convenience (low drop-out by participants), and a larger sample for long-term retention would be desirable in the future. It is noteworthy, however, that in spite of the large participant drop-out on long-term retention, the effect for intra-corporeal suturing remained prevalent.

On a methodological note, we found that accuracy scores show a very similar pattern as completion times, which makes logical sense. Improvements in proficiency seem to follow a similar pattern in terms of completion times and accuracy. The accuracy measure as an assessment tool in differentiating experimental groups may seem redundant due to the high correlation with completion times. However, this accuracy assessment can complement the information derived from completion times to provide specific feedback for improvements while coaching trainees. Some trainees show a more reserved and conservative approach while performing a task, whereas other participants are less patient and make more steps and errors per unit of time while completing a task. The former would be more suited as a surgical trainee. Hence, accuracy scores are useful for individual assessment of proficiency during selection and examination.

A drawback in this study is that we originally intended to measure performance of the basic and advanced tasks at the end of training block II. Unfortunately, the majority of participants in the massed group were unable to complete the intra-corporeal suturing task by training block II, which meant the allotted time for measurement would be exceeded if all tasks were to be recorded. In order to keep the length of training blocks equal in both conditions and stay on schedule, measurements for training block II were discarded.

Also, we used physical box-trainers in a skills lab setting, which limits the extent to which our findings can be generalized to laparoscopy training in the OR. Participants acquired basic laparoscopic motor skills in our training, which does not take into account other important skills required for performance in the OR (such as navigation skills, decision making, team dynamics, and knowledge of anatomy, patient, and procedure).

The effect of spacing does not only have patient safety implications, but also financial advantages. For a training institute, a spaced schedule requires fewer resources (lab reservations, laparoscopic simulators, mentoring staff) for training surgical residents. Additionally, spacing different types of learning activities can enhance trainees’ engagement in their training programs [28]. Since the advantages of spacing proved to be substantial, we recommend trainers to implement spaced practice in their surgical training curriculum. Scheduling training will perhaps be somewhat less convenient in terms of logistics, but the benefits in the quality of learning will outweigh the extra effort.