The number of minimally invasive surgeries (MIS) has increased over the last three decades, as patients, surgeons, and providers prefer shorter recovery times, lower levels of post-operative pain, and smaller incisions. However, this has also made surgical training more difficult. Learning curves in MIS are prolonged due to indirect 2D view, fulcrum and pivoting effects, limited degrees of motion, limited haptic feedback, and indirect access to the operative field [1, 2].

This increased difficulty greatly limits the possibility of practice in a clinical setting. Trainees are expected to come to a clinical setting already possessing a range of laparoscopic skills in order to ensure optimal patient safety. Therefore, training in a pre-clinical setting has become mandatory, with the literature indicating that this type of training results in improved performance during actual operations [3]. In addition, an increasing number of surgical societies are creating new guidelines of accreditation for surgeons, hoping to ensure that surgeons are adequately trained for the challenges of MIS. This has led to the introduction of specially designed intensive training courses for MIS [4,5,6].

A multitude of modalities have been developed for trainees to practise their MIS skills outside of the operating room; these include virtual reality (VR) trainers, box trainers, and pulsatile organ perfusion (POP) trainers. These have been shown to improve surgical skill and understanding when used alone [3, 7,8,9,10,11,12]. As a matter of fact, most surgical trainees are likely exposed to and train with more than one training modality, and the benefit of combining different modalities has been explored to some extent. However, there is a limited number of structured multi-modality training curricula incorporating several kinds of training modalities, in spite of research studies which have proven that variety in training programs for fields other than healthcare is beneficial to participants [13]. Therefore, it would be valuable to explore the optimal combination of training modalities. This could be even further advanced to evaluate whether the optimal training curriculum needs to be adapted to the current stage of a trainee’s surgical education.

This study aimed to evaluate the benefit of a structured multi-modality training program for surgical residents at different stages of training.

Methods

Participants

Junior and senior residents in general surgery were invited to participate in this study. Senior residents were postgraduate year (PGY) 3–6 and/or had performed more than 10 laparoscopic cholecystectomies (LC) and/or had already completed a minimum full 2-day laparoscopy course. Participation was voluntary, and participants were allowed to leave the study at any time. All participants received information about the study and provided informed consent. The local ethics committee at Heidelberg University approved the study (S-334/2011).

Study design

This was a registered (DRKS00011040) prospective, single-centre, rater-blinded, two-arm, randomized controlled trial (RCT). The study was designed, evaluated, and reported in line with the CONSORT criteria and was performed in the Minimally Invasive Surgery Training Centre at Heidelberg University’s Department of General, Visceral, and Transplantation Surgery [14, 15]. It was conducted with 1 active intervention group and 1 control group. All participants completed a pre-test, which constituted the completion of an LC on a porcine liver using the POP trainer. Participants were randomized to the Training group (multi-modality training) or to the Control group (no training) and were stratified based on experience level. Numbered, sealed, and opaque envelopes were used for randomization. The envelopes were computer-generated by an employee of the Department of Medical Biometry and Informatics. After intervention, all participants then completed the post-test by performing an LC on the VR trainer and another LC using the POP trainer (Fig. 1). The expert raters for pre- and post-test were blinded to group assignment of the participants.

Fig. 1
figure 1

Structured training curriculum with training tasks

Multi-modality training

The multi-modality group trained for 12 total hours on all of the training equipment available in the training centre: box trainers (KARL STORZ GmbH & Co. KG, Tuttlingen, Germany) including POP trainers (Optimist, Innsbruck, Austria), the VR trainer (Lap Mentor 2 ™, Simbionix ©, Cleveland, USA), and 3D training (KARL STORZ GmbH & Co. KG, Tuttlingen, Germany). The training curriculum was split into two portions, with each portion lasting 6 h and split into four 1.5-h training sessions (Fig. 1).

The first part of the training curriculum focused on learning basic laparoscopic skills using a box trainer. Participants spent 6 h training on 8 different skills exercises. Exercises 1–6 focused on basic skills and learning to adjust to the equipment and visuospatial issues of laparoscopic surgery. Exercises 7–8 introduced participants to laparoscopic suturing and knot tying with a focus on the surgeon’s and the slipping knot. Training was performed alone, with each exercise performed twice in a row before continuing to the next in order to reinforce learning. Participants were given a 10-min break per hour. Once all 8 exercises were completed, participants started over again from the beginning until they had reached their 6 h of training time. To ensure focus and adherence to the training curriculum, all training times and performances were to be documented by the trainees.

The second part of the training curriculum, building on the skills learned during the first part, focused on learning the procedure of a laparoscopic cholecystectomy on a VR trainer. The VR training involved 3 exercises: basic skills practice, basic procedural skills module: gallbladder removal, and a full LC procedure. The basic skills module was trained twice in a row before continuing, while the second and third modules were only performed once. Participants started the training over again once finished, until they had reached 6 h of training time.

Skills testing

Skill level was evaluated through a pre-test before intervention began and a post-test after intervention. The pre-test consisted of an LC performed on a porcine liver with a preserved gallbladder obtained from the local food industry and carried out in a POP trainer. The post-test also consisted of an LC in a POP trainer, but also included an LC on the VR. Every participant therefore performed a total of 2 porcine LCs and 1 VR LC as part of the study, regardless of their group assignment.

The Global Operative Assessment of Laparoscopic Skills (GOALS) was developed to meet the need of objectively quantifying surgical skills and has become a standard for laparoscopic procedures [16,17,18,19]. Expert raters, blinded to participants’ skill levels and group assignment, evaluated each LC performance using the GOALS score. GOALS consists of a five item global rating scale and has been demonstrated to have validity in assessing skill during LC [16,17,18]. Additionally, the domains of depth perception, bimanual dexterity, efficiency, tissue handling, and autonomy were expanded by difficulty [16]. Depth perception assesses the trainee’s ability in utilizing a 2D visual system. Bimanual dexterity is a measure of how well the operator uses both hands, while efficiency measures the trainee’s progress throughout the operation. Tissue handling focuses on the use of instruments to move and handle tissues. Lastly, autonomy assesses the amount of guidance required by the trainee to complete the surgery. Difficulty was added to the score by Chang et al. in order to adjust for anatomic variations of the specimen [19]. Each domain was rated from 1 (lowest performance) to 5 (highest performance) with a maximal overall score of 30 points.

Outcome measures

Primary Outcome: The GOALS total score was chosen as the primary outcome measure. The change in score from the pre-test to the post-test between the Training group and Control group was compared.

Secondary Outcomes: The operation time was identified as a secondary outcome. Subgroup analysis was performed to compare different experience levels. In addition, results from VR training, including operation time, efficiency of cautery, percentage of safe cautery, number of lost clips, number of complications, number of movements, path length, and average speed, were recorded as secondary outcomes. Video game experience and instrument handling experience were investigated for possible associations with LC performance.

Sample size calculation

The sample size determination was based on the assumption that the multi-modality curriculum could reduce the difference of the GOALS score between different stages of experience by 2.5 points with a standard deviation of 3 points [20]. This led to a minimum sample size of 25 participants in each group. To account for a potential dropout of 20%, a minimum of 60 participants were to be included in the study. Further information can be accessed in the study protocol [21].

Statistical analysis

Statistical analysis was conducted by an independent employee of the Department of Medical Biometry and Informatics at Heidelberg University who was otherwise not involved in the study. Descriptive statistics were used to describe the main characteristics of the included participants. For the trainee outcomes of GOALS score, operation time, and parameters of virtual training, the sum score, the mean, standard deviation, median, and minimum and maximum values were calculated for the pre-test and post-test. The same measures were also determined for the difference between pre- and post-training GOALS scores and operation times. For parametric data, a two-sided student’s t-test was used to calculate differences between groups (e.g. operation time, GOALS score). Furthermore, paired t-tests were applied to analyse whether there was a difference between pre- and post-goals score. In any cases of non-parametric data (e.g. number of lost clips and complications in the virtual reality training), a two-sided Wilcoxon test was used. Additionally, descriptive p values of the corresponding statistical tests were reported with the associated 95% confidence interval. P values smaller than 0.05 were regarded as statistically significant. Statistical analysis was carried out using SAS 9.4. Graphics were created using R 3.2.4.

Results

Demographics

In total, 64 participants were included in the randomized study. The Training group had 33 participants, while the Control group had 31 participants, with 8 drop-outs in the Training group (Fig. 2). Both groups were homogenous at baseline (Table 1).

Fig. 2
figure 2

The CONSORT flow diagram through the phases of the trial

Table 1 Baseline characteristics of Training and Control group

Training versus control group

The Training and Control groups achieved similar GOALS scores on the pre-test (13.7 ± 3.4 vs. 14.7 ± 2.6; p = 0.198) and completed the test in comparable amounts of time (57.0 ± 18.1 min vs. 63.4 ± 17.5 min; p = 0.191). After completing the required training, participants from the Training group showed a trend towards higher GOALS scores compared to the Control group in the post-test (16.7 ± 4.1 vs. 15.0 ± 2.9; p = 0.083). The average operation time in the post-test was significantly shorter in the Training group than in Control (40.0 ± 17.0 min vs. 55.0 ± 22.2 min; p = 0.012). The Training group significantly improved their GOALS score by 2.84 points from pre- to post-test (± 2.85, p < 0.001), while the Control group did not improve (0.55 points ± 2.34; p = 0.214). The Training group showed an improvement of 15.7 ± 17.2 min on average (p < 0.001) from pre- to post-test for operation time, while the Control group had an improvement of 7.8 ± 14.3 min (p = 0.014). Pre- and post-test results are summarized in Figs. 3 and 4.

Fig. 3
figure 3

Global Assessment of Technical Skills (GOALS) between Training and Control groups

Fig. 4
figure 4

Operation time between Training and Control groups

Junior versus senior residents and subgroups

When comparing junior residents to senior residents, regardless of group assignment, senior residents scored significantly higher on the GOALS score in the pre-test compared to junior residents (13.7 ± 2.7 vs. 18.3 ± 2.9; p = 0.010). Additionally, senior residents completed the operation in significantly less time as compared to junior residents (62.4 ± 17.4 min vs. 41.7 ± 10.4 min; p = 0.002). On the porcine LC post-test, there was no longer a significant difference between junior residents and senior residents in average GOALS scores (15.5 ± 3.4 vs. 18.8 ± 3.8; p = 0.120), but senior residents were still significantly faster than junior residents (50.1 ± 20.6 min vs. 25.0 ± 1.9 min; p < 0.001). Junior residents showed a significant improvement in GOALS score (1.84 points ± 2.71, p < 0.001), while senior residents (-0.60 points ± 3.13, p = 0.690) showed no improvement. Both junior residents and senior residents demonstrated a significant improvement in operation time from pre- to post-test (junior residents: 11.2 min ± 16.6, p < 0.001 and senior residents: 16.0 min ± 11.5, p = 0.036, respectively). There were no significant differences between junior residents of the two groups with regard to their GOALS score (13.0 ± 2.9 vs. 14.4 ± 2.4; p = 0.060) and operation time in the pre-test (58.8 ± 18.8 min vs. 66.1 ± 15.3 min; p = 0.146). After training, there were no significant differences between junior residents of the two groups with regard to the GOALS score (16.3 ± 4.0 vs. 14.9 ± 2.8; p = 0.170), while the junior residents in the Training group were significantly faster than the ones in Control (42.1 ± 17.2 min vs. 57.8 ± 20.9 min; p = 0.011). However, the junior residents in the Training group significantly improved their GOALS performance by 3.23 points (± 2.62, p < 0.001) from pre- to post-test, while the Control group did not (− 0.70 points ± 2.25; p = 0.116). All junior residents reduced their operation time significantly from pre- to post-test but the Training group had a greater improvement than the Control group (Training: 15.0 min ± 18.0, p = 0.002 and Control 7.7 min ± 14.8, p = 0.023).

Virtual reality training

Both the Training and Control groups performed an LC on the VR trainer as part of the post-test. No significant differences between the performance of the Training and Control groups were found (Table 2). However, there was a trend that the Training group had less complications than the Control group (Training 2.2 ± 2.7 vs. Control 6.2 ± 7.0; p = 0.056). There was no significant correlation between LC VR performance parameters and GOALS performance in the post-test on the porcine cadaveric LC (data not shown). However, there was a significant correlation between operation times of the porcine LC on the POP trainer and that of the VR LC (r = 0.467; p = 0.006).

Table 2 Measured parameters from the VR post-test

GOALS performance association with gender, instrument experience, and video gaming experience

There was no significant difference in GOALS performance between subgroups gender (p = 0.340), if an instrument had been played for at least 5 years (p = 0.438), or if video games had been played 3–4 times a week or for more than 5 years (p = 0.563). In linear regression analysis, there were no associations between these parameters either.

Discussion

The present RCT evaluated the benefit of a multi-modality training program for MIS for surgical residents of different experience levels. Trainees in the structured training program significantly improved their LC GOALS performance, whereas Control did not. All groups (Training, Control, junior residents, and senior residents) improved their LC operation time. At baseline, senior residents had significantly better GOALS scores than junior residents, but a significant difference no longer existed between the two groups on post-test scores. After multi-modality training, junior residents demonstrated similar operative performance to senior residents as shown by GOALS scores, but senior residents were still significantly faster.

The Training group showed significant improvement in the GOALS score, while the Control group did not show a significant improvement. This would be expected, as after completion of the training, the subjects of the Training group possessed a broad spectrum of skills necessary to perform laparoscopy safely. In the next step, the acquired principles of laparoscopy can be applied to different procedures in the real operating room and more complex tasks can be trained (external validity). A training approach from basic and unspecific to more complex and specific tasks seems useful, as shown by Stefanidis et al. In that study, training of basic skills helped to reduce time and costs for learning more complex manoeuvres, such as suturing, later on [22]. Despite being a more advanced task, learning of different knotting techniques was implemented in the presented curriculum as well. Being able to suture can be important for all experience levels in order to manage complications such as bleeding. That being said, Aggarwal et al. showed that even senior residents benefit from the training of complex tasks such as knot tying [23].

Junior residents significantly reduced operation time for LC and improved performance from pre-test to post-test. The junior residents in the Training group significantly improved their GOALS score as well, which the junior residents in the Control group did not. The junior residents in particular are at the stage of learning basic laparoscopy skills, which are adequately trained in the variety of training modalities used in this multi-modality program [24,25,26]. It is reasonable that the junior residents in the Control group also demonstrated improvement in terms of speed, as even the pre- and post-test can be seen as an opportunity to learn the operation, to become familiar with the training modality, and to improve skill level. This is in line with the findings by Coleman et al. evaluating the effects of a skills curriculum on laparoscopic proficiency in gynaecology [27]. They also showed that the non-trained group significantly reduced their task completion time on different training models such as a suture foam from pre- to post-test. Senior residents, on the other hand, have usually already mastered these basic skills and have less to gain from general training modalities. Similarly, the senior residents had already performed real LCs, and thus the measurable performance improvement in the GOALS score to be expected from the pre- to the post-tests was smaller. Other research studies have also found that junior residents benefit more from various training modalities than experts [28, 29]. This is in line with the plateau effects in learning curves and Paretos’ principle which was originally established after an observation between population and wealth [30]. Applied to learning laparoscopy, it should take roughly 20% of the training time to learn the first 80% of skills, while it takes 80% of the time to learn the last 20% of skills [31]. The fast learning in the beginning takes relatively little time compared to the acquired skills, while reaching the highest individual performance needs disproportionally more time. That could be one explanation for the finding that junior residents reached comparable results to senior residents in only 12 h of training. Therefore, there is a demonstrated benefit for a multi-modality program as in the present study, which is effective in increasing laparoscopic skills and understanding of basic principles, particularly for novices and junior residents [32]. It was also shown that the needs for surgical training change during the different stages of surgical education and need to be continuously adapted during residency programs. Trainees should be exposed to different modalities to set new stimuli and improve learning efficiency. Further studies could evaluate whether there is an advantage of multi-modality training with a different combination of modalities and in comparison with single modalities with or without e-learning, in order to determine the necessity of multi-modality training [33, 34]. Additionally, to further increase efficiency of surgical training, an ongoing study of our group is evaluating whether workplaces should be used alone or as a training dyad [35].

It should be mentioned, however, that the robustness of the learned skills against disturbances, such as noises, or in combination with secondary tasks is higher for experienced surgeons that have long reached a plateau in the learning curve if only scores and operative time are taken into account. Studies have shown that experienced surgeons are able to focus on another problem while operating (noise) and do not show a decline in their performance while performing secondary tasks at the same time (automaticity) [36,37,38]. After completion of the curriculum proposed in this study, the next stages of training then include operations on live animals with focus on the management of complications [39]. The final stage of training is intraoperative supervision of an expert surgeon, which can effectively help surgeons to perform complex procedures more safely [40, 41].

Predictably, all groups and subgroups (Training, Control, junior residents, and senior residents) demonstrated a significant reduction in operation time between the pre- and post-tests. Previous studies have found that time is often a measure of the user’s familiarity with the tools and training modality, rather than an improvement of skill level, and even expert surgeons exhibit small learning curves when using a new training modality [42,43,44]. However, one strength of using the GOALS scoring system is that dexterity is only one measure, while tissue handling, efficiency, and autonomy are also incorporated to minimize potential bias.

Looking at secondary outcomes, none of the parameters for the LC on the VR trainer showed any differences between the Training and Control groups. In Grantcharov et al., the authors separated groups based on experience level, with Group 1 being beginners who had performed less than 10 cholecystectomies (correlating to the Junior resident group in the present study) and Group 2 being intermediates who had performed between 15 and 80 cholecystectomies (correlating to the Senior resident group in the present study) [43]. A comparison of the learning curves for virtual reality training on a Minimally Invasive Surgical Trainer-Virtual Reality (MIST-VR) between these groups found that Groups 1 and 2 plateaued error scores after 5 and 1 repetitions and plateaued economy of movement scores after 6 and 3 repetitions, respectively. The authors concluded that the learning curve on the VR simulator was proportional to the amount of operation experience of the participants. In the present study, it is possible that each participant would have only completed a maximum of 2–3 repetitions on the VR trainer before the completion of the 6-h training period, depending on how quickly they were able to work through the modules. Therefore, it is likely that both the Junior and Senior residents did not train long enough to overcome the learning curve on the VR trainer.

Additionally, no correlation existed between the measured VR trainer parameters and the GOALS score. While interesting, this may be partially explained by the fact that some of the VR performance parameters, such as “number of movements” or “average speed,” consist of raw data, rather than a score. These parameters alone may therefore prove difficult to correlate with skill level, as there is no “correct” or “ideal” count to reach a higher skill level. For the other parameters that demonstrate safety or efficiency, such as “efficiency of cautery” or “percentage of safe cautery,” it is unclear exactly how the VR trainer calculates these parameters, thus leaving room for discrepancies between the rating of the VR trainer and the GOALS rating that might be given by an expert surgeon. A correlation was found, however, between the completion time of the VR LC and the porcine LC. From this correlation, it can be concluded that the VR training helped to solidify procedural knowledge about the LC and instrument familiarity, which was transferred to the porcine LC in the post-test.

In regard to other secondary outcomes, gender, video game experience, and instrument playing experience did not appear to have a role in participants’ performance.

Limitations

The effect of the training program might have been stronger if only technical skills and dexterity with the instruments were defined as primary outcome. However, we purposely chose LC as the test scenario in order to show that a structured training of basic skills in combination with VR training of basic and procedural skills positively influences the performance of a full procedure on real tissue. Interrater reliability was not calculated as part of the study. However, the GOALS score is well validated and showed good interrater reliability throughout a number of studies, e.g. by Vassilou et al. [16].

Conclusion

The present study shows that training in structured multi-modality programs is beneficial for junior residents to improve their laparoscopic surgical skill level and operative performance of LC, as well as to decrease the operation time during LCs for both junior and senior residents. A multi-modality training program such as the one in the present study may thus bring junior residents to the same skill level as senior residents with regard to instrument handling and basic laparoscopic principles. Additionally, the learning of surgical skills is a dynamic process which requires adaptation to the type of training and intervention based on the current training stage of each individual trainee. Future studies should evaluate multi-modality training in comparison to single-modality training programs and with regard to its robustness against disturbances or secondary tasks compared to traditional programs. Incorporating these features may be beneficial for senior residents.