Introduction

The simulation-based training (SBT) can be valuable for patients and training staff for safety. The evidence of skill transferability from SBT to the clinical environment has been reported [1]. In pediatric education, a meta-analysis showed that SBT was a highly effective educational modality [2]. However, given the lower number of cases and the technical complexity due to small working space in pediatric surgery than in general surgery, carefulness and safety are required; thus, SBT would have an important role in pediatric surgery. In fact, the use of SBT in pediatric surgery has been expanding and, in recent years, various simulators are emerging with the development of medical engineering technology [3]. Considering the effectiveness of training methods, competency assessment tools are essential. Recently, a systematic review on the validity and strength of SBT in pediatric surgery [4] described the current SBT models’ validity and level of evidence and provided recommendations. However, the role of simulators as assessment tools and the effect of training of using simulators on the field of pediatric surgery are still unclear. In this study, we aimed to examine the use of simulators in measuring surgical competence and to evaluate the effectiveness of SBT in pediatric surgery.

Methods

The conducting and reporting of the current systematic review conformed to the Preferred Reporting Items for Systematic Review and Meta- analysis (PRISMA) guidelines.

Study search strategy

This study was designed with the help of librarians to minimize sampling biases and a third party for the increased generalizability. Comprehensive literature searches were undertaken using the PubMed, Cochrane Library, and Web of Science databases from January 2000 to July 2017. A broad search was employed comprising four separate search concepts of Medical Subject Heading terms using “OR” to define the elements of “surgery,” “pediatrics,” “simulation,” and “training, evaluation.” The search results for each concept area were combined using “AND.” Data saturation was achieved through hand search from the papers’ reference lists.

Study extraction and data analysis

Two investigators reviewed all extracted studies independently. Studies that reported SBT which evaluated subject’s performance and/or training effects were included. Inclusion criteria were SBT for residents, fellows, and faculty members using a trainer box, virtual reality simulator, physical simulator, cadaver, or animal models. The objective SBT range was prepared based on Accreditation Council for Graduate Medical Education (ACGME) Program Requirements for Graduate Medical Education in Pediatric Surgery and the Japanese Society of Pediatric Surgeons requirements [5, 6]. No language limits were applied. We excluded studies that had only an evaluation of the simulators themselves. Exclusion criteria included SBT for medical school, pharmacology, and analgesia. Letters to editor, conference abstracts, and review articles were also excluded. Any discrepancies in interpretation were resolved through consensus adjudication. These studies were classified based on the training effects using the Kirkpatrick model and classified into four levels: level 1, reaction, if a trainee perceived value of the training; level 2, learning, if trainee’s knowledge or skill improved; level 3, behavioral change, if trainee’s behavior changed in the clinical environment, and level 4, results, if the training affected patient outcomes [7].

Results

A total of 5858 unique citations were retrieved based on the research question, and 1390 duplicates were removed electronically. The remaining 4468 abstracts were screened by titles and content. Finally, 67 articles were reviewed using full text analysis. Two additional articles were found through hand searching the reference list, and the final data were extracted from 43 articles (Fig. 1). These reports were mainly from the US, Japan, and Canada (30%, 28%, and 16%, respectively). A total of 81% of the studies were published after 2010. Settings, measures, and recourses of selected studies were described from the viewpoints of assessment and training of simulation as follows.

Fig. 1
figure 1

Study identification and selection flowchart. A total of 5858 unique citations were retrieved based on our research question and 43 articles were reviewed by full text analysis

Assessment tools

Twenty papers described simulators as assessment tools used for the evaluation of technical skills of trainer [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Table 1 describes 20 simulators regarding procedure contents, types of simulators, evaluation subjects, and evaluation methods (Table 1). Of 20 studies, 10 evaluated basic techniques of laparoscopic surgery, and most of these studies went through technical evaluation based on scoring, time, and penalties [8,9,10,11,12,13,14,15,16,17]. Six studies focused on thoracoscopic surgical training [20,21,22,23,24,25]. In contrast, most studies on thoracoscopic surgical training evaluated specific procedures such as diaphragmatic hernia repair, esophageal atresia, and trachea-esophageal fistula. The metrics that differentiated between novices and experts were time, accuracy, and/or performance assessment scales.

Table 1 Assessment tools to evaluate technical skills

Evaluation of training effects

Twenty-three papers evaluated the effectiveness of SBT [28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50]. They used simulators for the training of endoscopic basic surgical skills, fundoplication, airway foreign body, gastroschisis, trauma, acute care, extracorporeal membrane oxygenation cannulation, urology, fetal therapy, and cardiology. Table 2 shows a description of training outlines and training effects. While most training subjects were residents, two studies were targeted for fellows in acute care or urology [42, 47]. The required time for the training courses was between 3 h and 2 days for basic laparoscopic skills, 2 days for fundoplication, and between 1 h and one day for foreign body aspiration. One study about the training of acute care procedure for the faculty evaluated the length of the retention and it was 6 months [42]. Training efficiency was evaluated using mainly performance assessment scales (52%) and/or surveys (43%). Considering the Kirkpatrick model for evaluation of education, all studies were classified to levels 1 or 2, and no training model had a clinical effect that corresponded to level 3 or 4.

Table 2 Evaluation of training effects

Discussion

In this review, we examined the evaluation methods and impact of SBT in pediatric surgery. The evaluation to discriminate the trainers was undertaken using objective assessment methods such as scoring or a checklist, so that the risk of subjective assessment is reduced. In terms of educational outcomes, basic skill training involves accumulating evidence; however, limited studies have assessed the long-term retention of the training effect and no training method has been tested for skill acquisition in the clinical environment or improvement in patients’ clinical outcomes.

An appropriate assessment method is important in competency-based training [51]. A previous study showed that in-training evaluation reports by faculty members were at risk of subjective assessments [52]. In our review, one-fourth of the articles used an objective assessment scale such as the fundamentals of laparoscopic surgery (FLS), pediatric laparoscopic surgery (PLS), and objective structured assessment of technical skill, which had been previously validated. Therefore, the risk of subjective assessment could be reduced. In particular, the PLS simulator is considered the most effective assessment tool for basic laparoscopic procedures [8, 11, 15, 16]. The PLS simulator is a modified FLS trainer for pediatric use. It could distinguish experts, intermediates, and novices by motion analysis in addition to basic laparoscopic skills such as peg transfer, pattern cutting, ligating loop, extracorporeal suturing, and intracorporeal suturing. According to our review, half of the papers used time, path length, and suturing tension as objective evaluation scales for basic laparoscopic surgery, and they could properly distinguish expert from novice surgeons depending on the training content and targets. Thus, it would be useful to apply validated evaluation tools to SBT for more advanced or complex procedures in the future studies.

The metrics called time and accuracy, which include the time to complete the tasks and precision or error, are simple to interpret the results and do not require specific tools. However, from a clinical point of view, it is clear that a faster procedure does not always mean a secure outcome. To effectively use the metrics for surgical performance, it would be better if there was an expert who could provide supplementary feedback at the qualitative evaluation to lead to clinical practices [53]. Motion analysis has recently been well introduced, especially in the training of endoscopic surgery [10,11,12,13,14,15, 18, 19, 21, 28, 29, 31]. This metric measures various aspects of movements objectively such as velocity, acceleration, roll, range, and path length depending on the algorithm of the setting. As this metric can assess separate segments individually, it has potential to be used for formative assessment [54]. Considering the purpose of the evaluation, intended use, and the way of interpretation in advance would support the targeted assessments.

Given the limited opportunities for trainees to perform pediatric surgery, SBT is becoming increasingly important. Regarding the evaluation of training effects, half of these reviewed articles have focused on emergency initial treatments, which was common in the field of pediatrics, and only one-fourth focused on training for specific procedures. While there were relatively many level 2 studies in the field of basic skills or overlapping with pediatrics, such as those on acute care and foreign body aspiration, only two level 2 studies [31, 49] were found in the field of advanced pediatric surgery. The ultimate goal of training is to increase clinical effectiveness; however, no training model corresponding to level 3 or 4 was found in this review. Because of the ease of evaluation, the evidence of the Kirkpatrick model 1 or 2 for basic skill training has gradually accumulated. However, it will be necessary to prove the effect of SBT on more advanced and specific procedures and the outcome of Kirkpatrick model 3 or 4 with higher impact.

Although the retention of the training effect is also an important point in training, in our review, the programed training method had focused on the short-term training effect and did not evaluate the retention of the long-term training effect, except one study [42]. According to a systematic review of the spacing of surgical skill training sessions for medical trainees, distributed training sessions are possibly better than mass training, but no evidence was obtained about the optimal assessment interval [55]. There is little evidence about the long-term retention of training effects; thus, it is difficult to make recommendations at this time.

With regard to impact of the simulation training on clinical outcomes, Cox et al. reviewed Kirkpatrick’s level 4 studies in 2015 and reported 12 appreciable articles [56]. Among the literature, Zendejas et al. reported the simulation training of laparoscopic totally extraperitoneal (TEP) inguinal hernia repair [57]. This training was for residents and consisted of two elements: a web-based online cognitive part and a simulation-based skill training part. The clinical effect was measured by the improvement in operative time, operative performance score, complication rate, and length of hospitalization. The used operative performance score was the Global Operative Assessment of Laparoscopic Skills (GOALS), which was a valid and reliable tool to assess technical skills during a variety of procedures [58]. In general surgery, a more procedure-specific assessment tool that had the reliability and validity of the scores in the operating room and the skills laboratory, the GOALS-Groin Hernia (GOALS-GH), was used [59, 60]. Hernia repair was also a common procedure in pediatric surgery, and the laparoscopic percutaneous extraperitoneal closure (LPEC) method [61] is recently in use. However, to our knowledge, there are no such tools to show the transferability from a bench model to the clinical setting. It is necessary to provide evidence of the transferability of performance improvement and its quality as effects of SBT. However, Barsness et al. [3] reported that it was difficult to address this issue within the field of pediatric surgery because of limits related to infrastructure, knowledge, or time. They suggested referring to the evidence obtained in the educational field of general surgery. Transferability certainly could not be achieved without information of the number of cases. In future studies, collaboration research of multi-institutes for SBT in pediatric surgery would be necessary.

This study provides a review of simulators measuring technical competence and evaluates the effectiveness of SBT in pediatric surgery. As for the training effects, no study on its clinical outcomes was found. It is necessary to accumulate evidence on SBT transferability to clinical practice in pediatric surgery.