Introduction

Simulation has been historically used in high-risk tasks, such as military and aviation operations and nuclear power stations [3] to prevent and predict human errors before they occur and ensure that competency training is achieved in simulation. Human error and team work failure are the most common causes for safety threats in critical scenarios [4, 5]. Simultaneously surgical training in the Operating Theatre (OT) is compromised as a result of multiple factors, but mainly costs of OT and time, as well as greatest awareness of ethical issues and patient demands. Therefore, time and opportunities for teaching are expected to be effective and efficient [6, 7]. For these reasons [8], extra-clinical simulation models complement surgical training, and there has been a surge of simulation-based programs massed or integrated within the training pathway that are proving greatly beneficial for surgical training. There is some evidence that simulation-based educational assessments in the field of surgical education positively correlate with patients’ outcomes [9].

From the research perspective, it has been suggested that surgical educational research, in particular for simulation-based training [10] should adhere to the IDEAL recommendations [11]. IDEAL stands for Idea, Development, Exploration, Assessment and Long-term study. Subsequently, the proposed NANEP model is described by the criteria Idea, Development and Exploration whereas Assessment and Long-term study are not subject of the current study.

The incidence of umbilical hernia repair is 5% in the normal population, which implies that the surgical intervention occurs quite frequently [12]. Validated surgical models for conventional, non-minimally invasive surgery are scarce [2, 13, 14], to date no model for umbilical hernia repair exists.

Materials and methods

The silicon-based NANEP model was designed to be used for an open preperitoneal mesh repair of umbilical hernia. The development of the surgical NANEP course was supervised by the Institute of Medical Teaching and Medical Education Research of the University of Wuerzburg.

NANEP model

For construction of the NANEP model, a two-component silicone was used. The prototype was based on a human body. Different materials such as textiles, cotton and synthetic blood were used to achieve anatomical and visual realism and differing layers (e.g., skin, fatty tissue, abdominal fascia, peritoneum). Particular attention was placed on surgical handling (tissue handing), haptic feedback and the compatibility of the model with surgical instruments, suture materials and meshes, commonly employed in the OT. The umbilical hernia of the NANEP model was designed for the open preperitoneal mesh repair of umbilical hernia. The NANEP model can be classified as open surgical model [13]. It was manufactured at an expense of approximatively 20 Euro each.

Figure 1a, b shows the NANEP model, and Fig. 2a–d shows the most important surgical steps of umbilical hernia repair with mesh implantation in the NANEP model.

Fig. 1
figure 1

NANEP model: a anterior view of the umbilical hernia with clinically apparent prolapsed hernia. b Posterior view of the umbilical region with confluent plicae and yellow subcutaneous preperitoneal fatty tissue (targeted positioning for the mesh

Fig. 2
figure 2

Operation steps. a Preparation of the hernial orifice with visible prolapse of preperitoneal fat. b Ligation of the fatty tissue pedicle after resection. c Preperitoneal implantation of the mesh. d Closed hernial orifice

Sample size and design of the surgical NANEP course

The validation study of the NANEP model was conducted at the University Hospital Wuerzburg. Participants consisted of two groups, beginners (n = 12) and experts (n = 6). Twelve beginners were recruited among medical students. At the time point of assessment, they were in their final clinical year and reported pronounced interest in surgical techniques. The expert group consisted of six surgeons either in their final year of residency (n = 2) or in the first 2 years after residency (surgical specialists, n = 4). Sample size calculation was conducted according to Miskovic, for a power of 80% (Welch test) a minimum of 18 participants was suggested [15]. The index procedures were performed within the scope of the NANEP course (Fig. 3). One week prior to the course, participants received a written tutorial with operation steps. Prior to surgery 1 (operation 1, day 7) participants were exposed to a standardized video on which each operation step was demonstrated. Each surgery was performed by one participant, assisted by another participant. Each participant performed two operations with an interval of 1 week. Operations were video recorded. During the first operation, the tutor responded to questions of the participants and gave individual instructions. The second operation had to be performed without any additional assistance. After every surgery participants received oral feedback from the tutor. After the second operation, participants responded to the NANEP questionnaire (content validity). Afterwards, anonymized videos were uploaded on the platform CATLIVE.

Fig. 3
figure 3

Design of the study: the sample consisted of 18 participants (12 beginners and 6 experts)

The bilingual (English/German) online platform CATLIVE had been developed by the Institute for Artificial Intelligence and Applied Informatics of the Julius-Maximilians-University of Wuerzburg. Each video was rated by three blinded experts (reviewers) using the Competency Assessment Tool (CAT, construct validity, Fig. 3). Reviewer 1 was a renowned hernia expert and involved in the creation of the model, Reviewer 2 was involved in the development and construction of the model whereas reviewer 3 was a senior surgeon (general surgery). Reviewer 1 also rated the autopsy results.

Concepts used to assess the models’ quality

Content validity Validity in general is defined as the extent to which a test measures what it is supposed to measure [16, 17]. Content validity in particular is considered the most important step of test construction [18] since it determines construct validity and reliability. Content validity cannot be measured directly but requires definition of indicators [19, 20] which characterize the content [21]. In case of a surgical simulation model, these indicators are anatomy and realism [22] operationalized by items of the NANEP questionnaire. Reliability of items [21] was statistically inspected using Cronbach’s α [23, 24]. Values greater 0.70 are considered good [24], and values greater than 0.60 are considered acceptable [25, 26].

Construct validity Construct validity addresses the question whether the CAT measures competency of participants. In case the CAT measures competency, ratings of experts should be fairly consistent for all participants [20]. The latter was statistically approximated by calculation of interrater-reliability using the Finn coefficient [27]. The Finn coefficient varies between 0 and 1 where 1 implies absolute agreement between the raters. A Finn coefficient greater than 0.50 is considered acceptable, and values greater than 0.70 as good [28].

Differential validity Differential validity addresses the question whether evaluation criteria distinguish between beginners and experts [29]. For comparison, the Welch test was used since it proofed to be statistically superior to the common t test [30, 31].

Proficiency gain Once reliability and validity of the CAT tool is confirmed, performance and learning gain are measured [32]. An increase in competency from surgery 1 to surgery 2 is regarded a gain in learning. Results were statistically examined using the Welch test.

Autopsy An experienced surgeon examined all models after surgery. Aesthetic and functional criteria such as ligature of the fatty prolapse or suture of the hernial orifice plus peritoneal damage were investigated.

NANEP questionnaire

The NANEP questionnaire was used to collect demographic information and to evaluate content validity. Participants rated (1) anatomy, (2) realistic handling and (3) applicability for training. Response option was a 5-point Likert scale (1 = does not agree, 2 = does rather not agree, 3 = partially agrees, 4 = largely agrees, 5 = fully agree and 0 = N/A (not applicable)).

Competency assessment tool (CAT)

The CAT questionnaire was developed to measure competency in the field of colorectal surgery [33]. The CAT was adapted to suit the NANEP operation and segmented into procedural and content-specific aspects to capture the dynamic process of an operation. The NANEP operation itself can be segmented into three steps: (1) Exposure (2) Clearance of orifice and (3) Mesh position and orifice closure. Each step is classified by four categories: (I) Instrument use, (II) Tissue handling, (III) Near misses and errors, and (IV) End-product quality. The latter matrix accounts for 12 evaluation criteria, rated on four proficiency levels: 1 (worst performance) to 4 (best performance). Levels of competency are based on the Dreyfus model [34], Eraut [35] and Miskovic [15]. Additionally, raters were able to leave comments on the CATLIVE platform.

Results

The response rate for the NANEP questionnaire was 100%, and 18 forms were used for statistical analysis. The sample consisted of eight female and ten male participants. Twelve participants were internship students (assigned to the group of beginners) and 6 were surgeons doing general surgery residency or 2 years after completion of the residency (group of experts). Sex was equally distributed among the group of beginners with six male and six female students. The expert group consisted of two female and four male surgeons. Mean age of students was 26.18 years (SD = 2.27). On average participants were in this educational stage for 4.67 years (SD = 1.67). As expected mean age of residents or specialists was higher with 34.50 years (SD = 2.07). On average, they were in their position for 2.67 years (SD = 2.07).

Content validity

Overall, the NANEP model was rated positive. Results differed for anatomy, realistic handling and applicability for training (cf. tab. 1). Cronbach’s α was acceptable, all values exceeded 0.60. Open-answer-comments which occurred most frequently were: “Surgery felt extremely realistic”, “Practice of surgical skills was extremely helpful and fun!”, “A good opportunity to realistically practice an operation”, “Very good preparation for the surgery and great opportunity to recall anatomical structures” (Table 1).

Table 1 Results for content validity (NANEP questionnaire)

Construct validity (CAT)

Response rate for the CAT evaluation was 100%, and 36 operations were analyzed. Except for the category complications, the α value was good. The Finn coefficient verified good interrater-reliability for all categories (cf. Fig. 4).

Fig. 4
figure 4

Reliability values of the Competency Assessment Tool: Cronbach’s alpha values > 0.70 are good [24], values > 0.60 are acceptable [25, 26]. Finn coefficients’ > 0.50 show acceptable and > 0.70 good interrater-reliability in support of construct validity [28]

Differential validity and learning gain (CAT)

Comparison between beginners and experts verified differential validity. Experts significantly outperformed the beginners (Fig. 5).

Fig. 5
figure 5

Results CAT: means of beginners and experts in the 4 categories were compared. Mean (plus standard deviation) for the 4 categories: (I) Instrument use, (II) Tissue handling, (III) Near misses and errors, and (IV) End-product quality. Each category contains 3 steps: (1) Exposure, (2) Clearance of orifice, and (3) Mesh position and orifice closure. All 12 evaluation criteria were rated on four proficiency levels: 1 (worst performance) to 4 (best possible performance). In total, a maximum count for one category was 12, minimum count was 3. Results differ for beginners and experts, Welch test: ***p < .005

When analyzing the learning gain for beginners and experts separately, the learning gain for beginners is more pronounced. Beginners had a significant learning growth for the categories instrument use, tissue handling, near misses and errors and end-product quality. It may be noteworthy that experts began with excellent rating, limiting the possibility of improving skills (Fig. 6).

Fig. 6
figure 6

Results using CAT for assessment of video-taped operation of umbilical hernia in four categories: (I) Instrument use, (II) Tissue handling, (III) Near misses and errors and (IV) End-product quality. Results are shown for beginners and for experts, also stating standard deviations. Learning growth is analyzed by Welch test: *p < 0.05

Autopsy

Post-operative results were rated as follows: regarding the category “aesthetics” 42% were deficient, 33% satisfactory and 25% very good. Skin suture was perfect in 25% of the cases, in 64% skin asymmetric and in 11% loose. Suture of fascial orifice was insufficient in 53% of the cases, in 30% plain and in 17% bulged. 61% of the models showed no ligature. 25% of the models showed a plain and centered mesh, in 56% it was folded but still centered and in 19% it was not centered. In 86% of the cases, the peritoneum was intact, 3% were slightly injured and in 11% of the cases the mesh was exposed. Chi square test showed no significant difference for beginners and experts by inspection of repeated surgery. Photo examples of results are displayed in Fig. 7.

Fig. 7
figure 7

Autopsy results of post-operation NANEP model: a aesthetic (anterior view), case #06 very good, #04 deficient; b suture of hernial orifice (posterior view), case #28 sufficient and #04 insufficient due to untied knot; c ligature of fatty tissue (posterior view), case #18 sufficient and #31 no ligature; d peritoneum (posterior view), case #07 intact and #36 injured peritoneum

Discussion

The aim of the study was to develop a surgical simulator for open preperitoneal mesh repair of an umbilical hernia. The NANEP model can be characterized as high-fidelity simulator which enables to practice the entire operation. High-fidelity simulators are congruent to reality, which includes user interactions in real time [36]. The NANEP model reflects underlying anatomical morphology. Unlike low-fidelity models, it requires surgical decisions beyond a mere display of practical skills. The surgeon must choose adequate sewing material and technique, e.g., for a sufficient and flat fascial closure. A full-procedural simulation model [37] is characterized by a complex anatomical design which enables to perform an entire operation, as is the case for the open preperitoneal mesh repair of umbilical hernia. Benchtop models are usually considered as static low-fidelity models for practicing simple skills such as skin suture. It had been assumed that variable feedback would be impossible to integrate [38]. The NANEP model proofed this assumption wrong: benchtop models can be utilized to simulate entire operations.

In the present study, the NANEP model was validated with all its features, content validity was verified for anatomy, realistic handling and applicability for training. The non-significant p values for beginners and experts show content validity regardless of experience.

The significant proficiency gain for practical performance on the NANEP model is similar to results found in other studies [13, 39,40,41]. The significant proficiency gain for beginners indicates that the NANEP model is suitable for being implemented at the beginning of residency.

To effectively teach anatomy and practical skills, simulation models should become standard of the curricula [41, 42]. To maximize educational benefit when working with surgical simulation models, we recommend the following three principals:

  1. 1.

    Offer repetitive opportunities for training, so practical skills can be improved [41, 42]. The latter proofed true for the NANEP model (Fig. 6).

  2. 2.

    Ensure that the simulation model reflects a clinical need [41,42,43]. For this study, we selected umbilical hernia, since it is a frequent disease.

  3. 3.

    Give participants feedback [41, 42]. In our study, this was done using the CATLIVE platform, which automatically creates a sheet with feedback for participants.

Furthermore, simulation-training programs can complement surgical training [22]. The paradox finding that experts did not reach the highest score and end-product quality dropped at the second attempt can be attributed in part to the Dunning-Kruger effect. The Dunning-Kruger effect describes a cognitive bias in which experts become too comfortable in their own skills and fail to readjust their performance. This may stress the need to use the model for simulation-training programs to complement surgical training [21]. Such programs are already successful: the London General Surgical Skills Program run at Imperial College London is obligatory for every surgeon in residency in London since 2009 and includes various simulation models, for open and laparoscopic procedures [44]. A similar program has been implemented in the USA. Since 2008, the Fundamentals of Laparoscopic Surgery-Program (FLS) training module of the American Board of Surgery (ABS) is compulsory for the ABS Certifying Examination for General Surgery [45].

Conclusion

This study is the first to present a surgical simulator for open preperitoneal mesh repair of an umbilical hernia. It supports evidence for the positive impact simulation models regarding the development of skills implemented at early stages of surgical residency. Whereas Idea, Development and Exploration were presented, Assessment and Long-Term Study require further research in terms of a curricula implementation and examination of patients’ outcomes.