Validity and reliability evidence support task-specific metrics for laparoscopic fundoplication

Desir, Alexis; Marques, Carolina; Farah, Emile; R. Hegde, Shruti; Holcomb, Carla; Scott, Daniel J.; Sankaranarayanan, Ganesh

doi:10.1007/s00464-024-10675-4

Validity and reliability evidence support task-specific metrics for laparoscopic fundoplication

2023 SAGES Oral
Published: 21 February 2024

Volume 38, pages 2219–2230, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Surgical Endoscopy Aims and scope Submit manuscript

Validity and reliability evidence support task-specific metrics for laparoscopic fundoplication

Download PDF

Alexis Desir^1,2,
Carolina Marques²,
Emile Farah¹,
Shruti R. Hegde^1,2,
Carla Holcomb¹,
Daniel J. Scott¹ &
…
Ganesh Sankaranarayanan ORCID: orcid.org/0000-0003-1556-2797^1,2

249 Accesses
1 Altmetric
Explore all metrics

Abstract

Background

Laparoscopic hiatal hernia repair (LHHR) is a complex operation requiring advanced surgical training. Surgical simulation offers a potential solution for learning complex operations without the need for high surgical volume. Our goal is to develop a virtual reality (VR) simulator for LHHR; however, data supporting task-specific metrics for this procedure are lacking. The purpose of this study was to develop and assess validity and reliability evidence of task-specific metrics for the fundoplication phase of LHHR.

Methods

In phase I, structured interviews with expert foregut surgeons were conducted to develop task-specific metrics (TSM). In phase II, participants with varying levels of surgical expertise performed a laparoscopic Nissen fundoplication procedure on a porcine stomach explant. Video recordings were independently assessed by two blinded graders using global and TSM. An intraclass correlation coefficient (ICC) was used to assess interrater reliability (IRR). Performance scores were compared using a Kruskal–Wallis test. Spearman’s rank correlation was used to evaluate the association between global and TSM.

Results

Phase I of the study consisted of 12 interviews with expert foregut surgeons. Phase II engaged 31 surgery residents, a fellow, and 6 attendings in the simulation. Phase II results showed high IRR for both global (ICC = 0.84, p < 0.001) and TSM (ICC = 0.75, p < 0.001). Significant between-group differences were detected for both global (χ² = 24.01, p < 0.001) and TSM (χ² = 18.4, p < 0.001). Post hoc analysis showed significant differences in performance between the three groups for both metrics (p < 0.05). There was a strong positive correlation between the global and TSM (rs = 0.86, p < 0.001).

Conclusion

We developed task-specific metrics for LHHR and using a fundoplication model, we documented significant reliability and validity evidence. We anticipate that these LHHR task-specific metrics will be useful in our planned VR simulator.

Needs analysis for the development of a virtual reality laparoscopic hiatal hernia repair simulator

Article 06 February 2024

Training and assessment using the LapSim laparoscopic simulator: a scoping review of validity evidence

Article 19 September 2022

Subjective vs. objective assessment of simulation performance on laparoscopic cholecystectomy: are we evaluating the right things?

Article 01 February 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Achieving technical proficiency in laparoscopic surgery is critical as it remains the most frequently employed surgical technique by case volume [1]. Recent studies in bariatric [2] and colorectal [3] surgery have shown that greater technical skills are associated with better outcomes and fewer complications. Due to the difficulty in acquiring laparoscopic technical skills directly in the operating room, simulation-based training has emerged as a viable alternative [4,5,6,7,8,9]. Simulation training platforms provide a conducive learning environment to teach the technical and cognitive competencies necessary to master laparoscopic surgery in a safe, patient-free environment, without the cognitive load experienced in the operating room. An effective simulation-based training program is contingent upon having a robust curriculum with clearly defined and quantifiable performance metrics. Such metrics can be summative to establish a high stakes pass/fail determination or formative to provide trainees with targeted feedback for improvement. The most widely used summative tool for the evaluation of surgical performance is the Objective Structured Assessment of Technical Skills (OSATS) [10], a validated global tool for assessing operative performance in 6 domains, typically through video-based review. Formative assessment usually requires the creation and validation of task-specific metrics tailored precisely to each procedure or task.

Laparoscopic hiatal hernia repair (LHHR) is a complex procedure requiring advanced surgical training [11]. Attaining proficiency in this procedure is crucial given the high recurrence rate for such hernias, which is up to 50%, especially for paraesophageal hernias [12]. LHHR remains a difficult procedure to master with reported learning curves ranging from 50 to 200 cases [13,14,15], emphasizing the need to optimize training to acquire the necessary skills. Traditional anatomic models like cadavers and live animal models have been useful for simulating many procedures but may fall short in replicating some important aspects of human HHR and can pose ethical, cost, logistical, and curricular challenges. Advances in technology have made Virtual reality (VR) simulators a potentially ideal solution, offering detailed anatomic representations that are characteristic of HHR and facilitating focused, deliberate practice [16]. Using standardized simulation scenarios, VR trainers also enable automated objective assessment and targeted feedback to improve performance without the need for an expert surgeon reviewer. Importantly, skills acquired in VR simulators have been shown to improve operating room performance [17, 18]. We are developing a VR simulator for LHHR training as part of an NIH-funded project. The purpose of this study was to develop and assess task-specific metrics for LHHR, specifically evaluating their reliability and validity for the fundoplication portion of the procedure.

Materials and methods

This study was approved by the UT Southwestern Institutional Review Board and was done in two phases. In phase 1, interviews were conducted with experts to create task-specific metrics for the assessment of performance in laparoscopic Nissen fundoplication. In phase II, a bench model study was performed to evaluate validity evidence supporting the newly created metrics.

Development of task-specific metrics for fundoplication

We performed a hierarchical task analysis (HTA) of the LHHR by conducting hour-long semi-structured interviews with local foregut surgeons and experts from the Society for American Gastrointestinal and Endoscopic Surgeons (SAGES) Foregut Task Force. HTA in surgery is a well-known method that breaks down any given surgical procedure into tasks, sub-tasks, and motion end effectors, and it has been successfully used to deconstruct various minimally invasive procedures [19,20,21]. To guide our expert interviews, we formulated an initial list of procedure steps, drawing from recorded operative videos, information from textbooks, and prior task analysis of the laparoscopic fundoplication procedure [22, 23]. Experts were then asked to describe how they perform the procedure, highlight key moments, identify variations in the procedure, and list common procedural errors in order of their severity. The recordings were then independently analyzed by two authors (SH and GS) to create task trees with variations and a list of errors. Any discrepancies were resolved by an expert author (CH) and through consultations with the interviewed experts.

Validity evidence evaluation for the fundoplication task-specific metrics

In phase II, we assessed the validity evidence of the newly created metrics by conducting a study at the UT Southwestern Simulation Center using a porcine explant Nissen fundoplication model. Messick’s unitary framework was used to evaluate the validity of our task-specific metrics [24]. Specifically, data were collected to evaluate validity evidence in the following domains: content alignment, response process, internal structure, and relationship to other variables.

Fundoplication simulator design

We created a Nissen fundoplication simulator using a porcine stomach explant, which was placed inside a modified version of a laparoscopic box trainer [4] (Fig. 1). A frozen porcine stomach and esophagus specimen (Animal Technologies Inc., Tyler, Texas) was thawed and positioned in the box trainer. The esophagus was passed through a small circular incision in the lap box, about 2 inches from the base and held taut using an Allis clamp. To prevent lateral movement, the stomach was secured with two alligator clips. To create a retroesophageal window for the fundoplication, a Penrose drain was inserted through a circular incision about 4 inches from the base to lift the stomach at the gastroesophageal junction and keep the model under tension. A 0° laparoscope connected to a standard equipment tower was used for visualization. A pair of standard laparoscopic needle drivers, curved graspers, and scissors were used to perform the procedure. In addition, 2–0 silk sutures pre-cut to 15 cm in length were placed on a foam box to be used for suturing.

Study design and procedure

The study was performed at the UT Southwestern Simulation Center with a between-subjects design. Recruited participants were stratified into three groups by level of expertise: novice (post-graduate year [PGY] 1–2 residents in general surgery), intermediates (PGY 3–5 residents and a minimally invasive fellow), and experts (faculty).

Prior to starting the procedure, each participant completed a survey that captured demographic information, clinical experience, and simulator experience. After providing informed consent, participants were given general instructions explaining the study objective and the task, without any technical/operative guidance. Specifically, we did not provide any instructions on number and type of sutures, the distance between the sutures and the placement of the wrap. Participants were then asked to complete a Nissen fundoplication on the porcine stomach model. They were given 1 h to complete up to 2 unassisted attempts. Video recordings focused on the instruments actively utilized in the laparoscopic box trainer, the training model itself, and a card displaying the participants’ random identification number to ensure anonymity during the video review process. Additionally, we also collected and analyzed the following real-time in situ metrics for each participant: (I) number of attempts completed (1 or 2), (II) number of sutures placed for fundoplication, (III) space between sutures measured in centimeters, and (IV) whether seromuscular bites were taken through the esophagus (dichotomized as 0 or 1).

At the conclusion of the study, participants were asked to complete a post-simulation survey to assess the quality of the simulator on a 5-point Likert scale. The survey covered 5 categories that included the visual appearance of the simulation, the quality of models and textures, the realism of the simulator interface, how closely the task mirrored the actual surgical procedure, and the simulator’s overall effectiveness in teaching LHHR.

Two qualified raters, blinded to the participants' experience levels, independently evaluated the video recordings of performances using both global and task-specific metrics. Table 1 presents the global metrics derived from the OSATS rubric, whereas Table 2 displays the task-specific metrics grounded in the HTA [10, 25,26,27,28]. Among the OSATS domains, we excluded the scoring rubric for knowledge of instruments because all participants were provided with the same set of laparoscopic tools. Initially, the two raters assessed the performance of 5 participants, comparing their ratings to discuss the grading and to resolve discrepancies. They then evaluated another 5 videos to ensure concordance between their ratings and reviewed the intraclass correlation coefficient (ICC). Finally, each rater independently graded the remaining videos.

Table 1 Rubric for assessing performance using global metrics

Full size table

Table 2 Task-specific metrics for assessing performance of the creation and securing the wrap portion of the laparoscopic fundoplication procedure

Full size table

Data analysis

The ICC estimates and their 95% confident intervals for establishing interrater reliability (IRR) were calculated based on mean rating (k = 2), absolute agreement, and 2-way mixed-effects model. An ICC value between 0.75 and 0.9 was deemed good, while a value above 0.9 was deemed excellent for IRR [29]. A total score was calculated by first averaging the individual metric scores from both raters and then summing them up for both global and task-specific evaluations. The Spearman rank correlation test was used to assess the association between the total global and task-specific scores. To determine performance differences between the three groups, the data were first evaluated for normality using the Shapiro–Wilk test. If the data were normally distributed, a one-way analysis of variance (ANOVA) was conducted, followed by a pairwise t test with Bonferroni correction for post hoc analysis. If not normally distributed, the Kruskal–Wallis test was employed, followed by a pairwise Wilcoxon test with Benjamini–Hochberg correction. Post hoc effect size was reported when appropriate.

Sample size

A priori power analysis was conducted using the G*software [30] to test the difference in performance between the three groups, with α = 0.5, a medium effect size f = 0.5 and power β = 0.8. The analysis showed that a total of 30 subjects equally distributed in three groups was needed to achieve the necessary power.

Results

Phase I results

Task analysis

A total of 12 expert foregut surgeons participated in interviews for task analysis, spanning 720 min in total. Table 3 displays the HTA of LHHR, outlining 6 major tasks, 27 subtasks, and 19 major errors. Using the HTA (Table 3) and the cataloged errors, we formulated metrics for video-based assessment of the LHHR (see Appendix 1).

Table 3 Hierarchical task analysis of the laparoscopic hiatal hernia repair showing major tasks, sub-tasks, and errors

Full size table

Phase II results

Pre-survey results

Demographics

A total of 38 participants were recruited to complete the fundoplication simulation (Table 4). Participants were grouped into novice (n = 17, 45%), intermediate (n = 15, 39%), and expert (n = 6, 16%). Additionally, 50% (n = 19) were male, 45% (n = 17) were under the age of 30, 87% (n = 33) self-reported being right-handed, and 58% (n = 22) were wearing corrective lenses.

Table 4 Demographics of the participants

Full size table

Prior experience

The overwhelming majority of novice and intermediate participants (n = 28, 88%) reported having observed 0–10 HHRs, while 3 (9%) reported observing 11–30 cases and 1 (3%) reported observing 30–50 cases. Among the attending surgeons, most had observed and/or participated in at least 100 cases and only 1 reported observing/participating in less than 100 cases. Overall, 42% (n = 16) of the participants self-reported a prior exposure to a robotic (Da Vinci) or laparoscopic (Fundamentals of Laparoscopic Surgery) simulation trainer, indicating that a subset of participants had previous hands-on engagement or familiarity with the technology being assessed. Additionally, 37% (n = 14) reported having gaming experience, with more than half of them (n = 10) playing at least 1–5 h a week. None of the participants included in the study reported any exposure to VR laparoscopic training.

Post-simulation survey results

After the Nissen fundoplication task, we conducted a post-simulation survey in which participants rate the realism and usefulness of their experience on a scale of 1–5, with 1 being not realistic/useful and 5 being very realistic/useful. The survey questions covered 5 categories that included the realism of the anatomy of the model, the realism of the ex vivo porcine model (texture), the realism of the simulator interface (instruments, display), the overall realism of the task compared to the actual surgical task, and the overall perceived usefulness of the simulator for learning laparoscopic hiatal hernia surgical skills. Table 5 shows the survey results for the degree of realism and usefulness of the fundoplication simulation model. The vast majority of participants from all three groups rated the simulator’s realism aspects highly, recognizing its usefulness and capability to capture the essential features of the task, thus establishing the content alignment.

Table 5 Survey completed after performing the Nissen fundoplication simulation on the porcine model

Full size table

Reliability analysis

The IRR between the two blinded raters was good for both the global- (ICC = 0.84, 95% CI 0.79–0.87, p < 0.001) and task-specific metrics (ICC = 0.75, 95% CI 0.7–0.78, p < 0.001), thereby establishing internal structure validity. Grading the videos with blinded raters mitigated potential errors due to rater bias, thus ensuring response process validity.

Analysis of metrics

The descriptive statistics of the metrics used for assessing performance are shown in Table 6. Due to the unequal sample size of the groups and data violating normality using the Shapiro–Wilk test, non-parametric tests were used and are reported here.

Table 6 Median and interquartile range (IQR) of metrics used for the assessment of performance

Full size table

Global metrics

Table 6 presents the median and interquartile range of the total global scores for all three groups. The Kruskal–Wallis test showed a significant difference in performance between the groups (χ² = 24.01, p < 0.001). As depicted in Fig. 2, performance improved with increasing level of expertise. Post hoc analysis revealed significant differences among all three groups: novice vs. intermediate (p = 0.001), intermediate vs. expert (p = 0.01), and novice vs. expert (p = 0.007).

Task-specific metrics

The median and interquartile range of the total task-specific scores for all three groups are shown in Table 6. The Kruskal–Wallis test revealed a significant difference in performance among the groups (χ² = 18.4, p < 0.001). As illustrated in Fig. 3 and mirroring the total global score, performance improved with increasing levels of experience. Post hoc analysis showed a significant difference in performance among all three groups: novice vs. intermediate (p = 0.001), intermediate vs. expert (p = 0.03), and novice vs. expert (p = 0.001). The Spearman rank correlation indicated a strong association between the total global score and the total task-specific scores (rs = 0.87, p < 0.001), as depicted in Fig. 4. In addition, Fig. 5 displays photos of subjects executing various components of the task-specific metrics.

In situ metrics

I.
Number of attempts: all of the participants in the expert and intermediate groups were able to complete the maximum of 2 attempts in the allotted time except for 1 subject each in both groups; whereas, in the novice group, only 6 out of 17 subjects were able to proceed to the second attempt. The Kruskal–Wallis test showed a significant difference in the number of attempts between the groups (χ² = 12.5, p = 0.001). Post hoc analysis showed a significant difference between the novice and intermediate groups (p = 0.002). No difference was found between the novice and expert group (p = 0.07) and the intermediate and expert group (p = 0.54).
II.
Number of sutures: all the subjects in the expert group placed 3 sutures to complete the fundoplication. In the intermediate group, 13 subjects placed 3 sutures and 2 placed 4 sutures. In the novice group, 3 placed only 1 suture, 2 placed 2 sutures, 8 placed 3 sutures, and 4 placed 4 sutures. Kruskal–Wallis tests showed no significant difference in the number of sutures placed between the three groups (χ² = 0.94, p = 0.62).
III.
Sum of distance between sutures: experts sum of distance ranged from 2 to 3.5 cm, the intermediate group sum ranged from 1.4 to 4.4 cm, and the novice group sum ranged from 0 to 3.5 cm. The Kruskal–Wallis test showed significant differences between the groups in the sum of the distances for all the sutures (χ² = 6.04, p = 0.04). Post hoc analysis could not find any significant differences between novice and intermediate groups (p = 0.15), novice and expert groups (p = 0.08), and between intermediate and expert groups (p = 0.08).
IV.
Seromuscular bite: overall, 5 out of 6 subjects in the expert group, 10 out of 15 subjects in the intermediate group, and 5 out of 17 subjects in the novice group placed a seromuscular bite on the esophagus while performing fundoplication. The Kruskal–Wallis test showed a significant difference between the groups in seromuscular bite taken during fundoplication (χ² = 6.94, p = 0.03). Post hoc analysis could not find any differences between novice and intermediate groups (p = 0.06), novice and expert groups (p = 0.06), and between intermediate and expert groups (p = 0.49).

The results from the analysis revealed the metrics’ relationship to other variables, confirming construct validity.

Discussion

Our results demonstrate that task-specific metrics differentiate the performance in the wrap creation step of the laparoscopic fundoplication between novice, intermediate, and expert surgeons. A strong positive correlation was also observed between the validated global OSATS score and our task-specific scores. High IRR for both metrics established the feasibility of using our task-specific metrics for video-based assessment of performance. Additionally, it is noteworthy that 89% of participants rated the simulator’s usefulness as either 4 or 5 on a scale of 5. This rating was further supported by informal comments from several non-expert participants throughout the study expressing their desire for this practice opportunity before performing the procedure in the operating room. Many trainees also mentioned how the experience enhanced their confidence when approaching such cases involving live patients.

Expertise in laparoscopic hiatal hernia surgery requires extensive training with high case volume. Learning curve studies have shown that for individual surgeons, a total of 20–40 cases, and for individual institutions, about 50 cases, are needed for stabilization of postoperative complication rates [13, 31]. In a 10-year institutional learning curve study, it was found that 200 fundoplication cases had to be performed before operative time, conversion rates, and complications plateaued [14]. Given the procedure’s long learning curve, obtaining adequate training is further complicated by a substantial number of cases performed in high-volume centers, indicating centralization of this procedure to a few specialty centers [32]. This can affect the number of cases performed by residents, whose training pathways in complex foregut surgery are limited to their experience in the operating room. In our study, 88% of residents reported participating in 10 or fewer LHHRs. Simulation-based training can help bridge this gap by providing an opportunity for trainees to practice this task outside of the operating room.

As the exposure of surgical trainees to LHHR varies based on whether or not they are at a high-volume center, a simulator for training in this procedure is essential. Such a simulator should not only be capable of training the important cognitive and technical aspects of this procedure but should also be capable of both high-stakes summative and low-stakes formative assessment of skills. Several tools exist for video-based assessment of performance in LHHR with limited validity evidence [33]. A majority of training programs use a global tool for assessment of laparoscopic performance, such as the OSATS and the Global Operative Assessment of Laparoscopic Surgery (GOALS) [34,35,36,37] or a combination of global scales and procedure-specific assessment tools in the form of checklists [38, 39]. In a study by Peyre et al. [40], investigators focused on a detailed 65-step procedural checklist previously developed based on task analysis for the evaluation of technical performance in laparoscopic Nissen fundoplication [41]. Sixty-four of the 65 steps showed high degree of reliability (> 0.8) when expert operative performance of Nissen fundoplication was graded by five surgeons using the checklist. More recently, as part of its master’s program, SAGES developed a video-based assessment tool for laparoscopic fundoplication and demonstrated its content validity [22]. In our work, we independently developed metrics for assessment using the well-established HTA method. Overall, major tasks and sub-tasks aligned with prior HTA findings for this procedure [20, 22, 41]. Using HTA, we identified 19 major errors and developed task-specific metrics to evaluate performance for LHHR. Such task-specific metrics developed using HTA and expert consensus have been validated for the assessment of performance in endotracheal intubation and colorectal anastomosis procedures [27, 28, 42]. Though only the task-specific metrics for the creation and securing the wrap portion of the procedure were tested in our work, we were able to clearly establish validity evidence in the following domains defined by Messick’s unitary framework, namely, content alignment, response process, internal structure, and relationship to other variables.

One unique aspect of this study was the incorporation of in situ metrics in addition to our task-specific metrics for assessment. Both the number of attempts and placement of seromuscular bite were found to be useful metrics, which could be easily incorporated in the VR simulator for assessment. Though the goal of the work was to develop assessment metrics to incorporate in our VR simulator, the developed metrics with their validity evidence can also be used for video-based assessment of performance in laparoscopic fundoplication procedures. We showed the relationship of our metrics to other variables by comparing our task-specific metrics to OSATS but due to constraints in time in performing video-based assessment, it is not yet known how our task-specific metrics correlate with other instruments developed for this procedure, which will be part of our future work.

The transferability of skills from simulation to live OR must be a priority when creating a simulator. Transferability would both encourage usage and result in an actual improvement in live operative technical skills and patient outcomes. Although we did not test the initial dissection and reduction of the hernia sac with its contents and the assessment of intraabdominal esophageal length due to constraints in creating a physical model, we plan to test those aspects later in a VR model. We have created a model of the crura with an enlarged esophageal hiatus and are performing studies to establish validity of the metrics for the crural repair portion of the procedure, which will be reported separately. Our fundoplication simulation closely mimics a portion of the actual LHHR operation with a few differences, namely the simulation’s lack of a diaphragm; hence, it does not replicate the exact constraints experienced in the real surgery. The realism of our simulation is evident from the feedback of the participants, 79% of whom graded it 4 or 5 on a 5-point scale of realism.

Limitations of this study include a relatively small sample size and varying participant numbers across the groups. While we were able to maintain sufficient representation from each level of surgical expertise, the intermediate and expert groups had comparatively fewer participants. This could be attributed to the escalating operative and clinical responsibilities associated with each PGY level, leaving less availability for participating in research studies. The smaller and unequal sample size also resulted in small or moderate effect size with no clear post hoc comparison results for our in situ metrics. Furthermore, despite blinding of the identities of participants in the videos, there might still be some reluctance and apprehension regarding skill evaluation among participants. Finally, due to resource constraints, we could not use the flexible endoscopy in our study to assess quality and securement of the wrap, such as tightness and potential full-thickness bites.

Using an ex vivo fundoplication model, this study established the validity and reliability of task-specific metrics developed for assessment of performance in the creation and securing the wrap portion of the LHHR. The developed simulator and the video-based assessment metrics can be used for training and assessment in this procedure. Our next step is to incorporate the validated task-specific metrics in our VR simulator for automated assessment.

References

St John A, Caturegli I, Kubicki NS, Kavic SM (2020) The rise of minimally invasive surgery: 16 year analysis of the progressive replacement of open surgery with laparoscopy. JSLS 24(4):e202000076
Article Google Scholar
Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJO (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369(15):1434–1442
Article CAS PubMed Google Scholar
Stulberg JJ, Stulberg JJ, Huang R, Kreutzer L, Ban K, Champagne BJ, Steele SR, Johnson JK, Holl JL, Greenberg CC, Bilimoria KY (2020) Association between surgeon technical skills and patient outcomes. JAMA Surg 155(10):960–968
Article PubMed PubMed Central Google Scholar
Scott DJ, Bergen PC, Rege RV, Laycock R, Tesfay ST, Valentine RJ, Euhus DM, Jeyarajah DR, Thompson WM, Jones DB (2000) Laparoscopic training on bench models: better and more cost effective than operating room experience? J Am Coll Surg 191:272–283
Article CAS PubMed Google Scholar
Ritter EM, Scott DJ (2007) Design of a proficiency-based skills training curriculum for the fundamentals of laparoscopic surgery. Surg Innov 14(2):107–112
Article PubMed Google Scholar
Stefanidis D, Korndorffer JR, Markley S, Sierra R, Scott DJ (2006) Proficiency maintenance: impact of ongoing simulator training on laparoscopic skill retention. J Am Coll Surg 202(4):599–603
Article PubMed Google Scholar
Korndorffer JR, Dunne JB, Sierra R, Stefanidis D, Touchard CL, Scott DJ (2005) Simulator training for laparoscopic suturing using performance goals translates to the operating room. J Am Coll Surg 201(1):23–29
Article PubMed Google Scholar
Fried GM, Feldman LS, Vassiliou MC, Fraser SA, Stanbridge D, Ghitulescu G, Andrew CG (2004) Proving the value of simulation in laparoscopic surgery. Ann Surg 240(3):518;discussion 525–528
Article Google Scholar
Scott DJ, Dunnington GL (2008) The new ACS/APDS skills curriculum: moving the learning curve out of the operating room. J Gastrointest Surg 12:213–221
Article PubMed Google Scholar
Martin JA, Regehr G, Reznick R, Macrae H, Murnaghan J, Hutchison C, Brown M (1997) Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 84(2):273–278
CAS PubMed Google Scholar
Roman S, Kahrilas PJ (2014) The diagnosis and management of hiatus hernia. BMJ. https://doi.org/10.1136/bmj.g6154
Article PubMed Google Scholar
Targarona EM, Grisales S, Uyanik O, Balague C, Pernas JC, Trias M (2013) Long-term outcome and quality of life after laparoscopic treatment of large paraesophageal hernia. World J Surg 37:1878–1882
Article PubMed Google Scholar
Watson DI, Baigrie RJ, Jamieson GG (1996) A learning curve for laparoscopic fundoplication. Definable, avoidable, or a waste of time? Ann Surg 224(2):198
Article CAS PubMed PubMed Central Google Scholar
Zacharoulis D, O’Boyle CJ, Sedman PC, Brough WA, Royston CMS (2006) Laparoscopic fundoplication: a 10-year learning curve. Surg Endosc Interv Tech 20:1662–1670
Article CAS Google Scholar
Gill J, Booth MI, Stratford J, Dehn TCB (2007) The extended learning curve for laparoscopic fundoplication: a cohort analysis of 400 consecutive cases. J Gastrointest Surg 11:487–492
Article CAS PubMed PubMed Central Google Scholar
Satava RM (1993) Virtual reality surgical simulator. Surg Endosc 7:203–205
Article CAS PubMed Google Scholar
Seymour NE, Gallagher AG, Roman SA, O’Brien MK, Bansal VK, Andersen DK, Satava RM (2002) Virtual reality training improves operating room performance: results of a randomized, double-blinded study. Ann Surg 236:458-463;discussion 463–454
Article PubMed PubMed Central Google Scholar
Hashimoto DA, Sirimanna P, Gomez ED, Beyer-Berjot L, Ericsson KA, Williams NN, Darzi A, Aggarwal R (2015) Deliberate practice enhances quality of laparoscopic surgical performance in a randomized controlled trial: from arrested development to expert performance. Surg Endosc Other Interv Tech 29:3154–3162
Article Google Scholar
Cristancho SM (2008) Quantitative modelling and assessment of surgical motor actions in minimally invasive surgery. Doctor of Philosophy - PhD thesis, University of British Columbia, Vancouver
MacKenzie L, Ibbotson JA, Cao CGL, Lomax AJ (2001) Hierarchical decomposition of laparoscopic surgery: a human factors approach to investigating the operating room environment. Minim Invasive Ther Allied Technol 10:121–127
Article Google Scholar
Sarker SK, Chang A, Albrani T, Vincent C (2008) Constructing hierarchical task analysis in surgery. Surg Endosc Other Interv Tech 22:107–111
Article Google Scholar
Ritter EM, Gardner AK, Dunkin BJ, Schultz L, Pryor AD, Feldman L (2020) Video-based assessment for laparoscopic fundoplication: initial development of a robust tool for operative performance assessment. Surg Endosc 34:3176–3183
Article PubMed Google Scholar
MacKenzie L, Caroline GLC, Ibbotson JA, Alan JL (2001) Hierarchical decomposition of laparoscopic surgery: a human factors approach to investigating the operating room environment. Minim Invasive Ther Allied Technol 10(3):121–127
Article Google Scholar
Messick S (1994) Validity of phsychological assessment: vallidation of inferences from persons’ responses and performances as scientific inquirey into score meaning. Res Rep 45:1–28
Google Scholar
de Montbrun S, Roberts PL, Satterthwaite L, MacRae H (2016) Implementing and evaluating a national certification technical skills examination. Ann Surg 264(1):1–6
Article PubMed Google Scholar
De Montbrun SL, Roberts PL, Lowry AC, Ault GT, Burnstein MJ, Cataldo PA, Dozois EJ, Dunn GD, Fleshman J, Isenberg GA, Mahmoud NN, Reznick RK, Satterthwaite L, Schoetz D, Trudel JL, Weiss EG, Wexner SD, MacRae H (2013) A novel approach to assessing technical competence of colorectal surgery residents: the development and evaluation of the colorectal objective structured assessment of technical skill (COSATS). Ann Surg 258(6):1001–1006
Article PubMed Google Scholar
Sankaranarayanan G, Parker LM, Khan A, Dials J, Demirel D, Halic T, Crawford A, Kruger U, De S, Fleshman JW (2022) Objective metrics for hand-sewn bowel anastomoses can differentiate novice from expert surgeons. Surg Endosc 37(2):1282–1292
Article PubMed Google Scholar
Sankaranarayanan G, Parker LM, Jacinto K, Demirel D, Halic T, De S, Fleshman JW (2022) Development and validation of task-specific metrics for the assessment of linear stapler-based small bowel anastomosis. J Am Coll Surg 235:881–893
Article PubMed PubMed Central Google Scholar
Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15(2):155–163
Article PubMed PubMed Central Google Scholar
Faul F, Erdfelder E, Lang A-G, Buchner A (2007) G*power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39:175–191
Article PubMed Google Scholar
Neo EL, Zingg U, Devitt PG, Jamieson GG, Watson DI (2011) Learning curve for laparoscopic repair of very large hiatal hernia. Surg Endosc 25:1775–1782
Article PubMed Google Scholar
Schlottmann F, Strassle PD, Allaix ME, Patti MG (2017) Paraesophageal hernia repair in the USA: trends of utilization stratified by surgical volume and consequent impact on perioperative outcomes. J Gastrointest Surg 21:1199–1205
Article PubMed Google Scholar
Bilgic E, Al Mahroos M, Landry T, Fried GM, Vassiliou MC, Feldman LS (2019) Assessment of surgical performance of laparoscopic benign hiatal surgery: a systematic review. Surg Endosc 33:3798–3805
Article PubMed Google Scholar
Vassiliou MC, Feldman LS, Andrew CG, Bergman S, Leffondré K, Stanbridge D, Fried GM (2005) A global assessment tool for evaluation of intraoperative laparoscopic skills. Am J Surg 190(1):107–113
Article PubMed Google Scholar
Bilgic E, Watanabe Y, McKendy K, Munshi A, Ito YM, Fried GM, Feldman LS, Vassiliou MC (2016) Reliable assessment of operative performance. Am J Surg 211:426–430
Article PubMed Google Scholar
Ahlberg G, Kruuna O, Leijonmarck CE, Ovaska J, Rosseland A, Sandbu R, Strömberg C, Arvidsson D (2005) Is the learning curve for laparoscopic fundoplication determined by the teacher or the pupil? Am J Surg 189(2):184–189
Article PubMed Google Scholar
Hogle NJ, Liu Y, Ogden RT, Fowler DL (2014) Evaluation of surgical fellows’ laparoscopic performance using global operative assessment of laparoscopic skills (GOALS). Surg Endosc 28:1284–1290
Article PubMed Google Scholar
Ghaderi I, Auvergne L, Park YS, Farrell TM (2015) Quantitative and qualitative analysis of performance during advanced laparoscopic fellowship: a curriculum based on structured assessment and feedback. Am J Surg 209:71–78
Article PubMed Google Scholar
Dath D, Regehr G, Birch D, Schlachta C, Poulin E, Mamazza J, Reznick R, Macrae HM (2004) Toward reliable operative assessment: the reliability and feasibility of videotaped assessment of laparoscopic technical skills. Surg Endosc 18:1800–1804
Article CAS PubMed Google Scholar
Peyre SE, Peyre CG, Hagen JA, Sullivan ME (2010) Reliability of a procedural checklist as a high-stakes measurement of advanced technical skill. Am J Surg 199:110–114
Article PubMed Google Scholar
Peyre SE, Peyre CG, Hagen JA, Sullivan ME, Lipham JC, DeMeester SR, Peters JH, DeMeester TR (2009) Laparoscopic Nissen fundoplication assessment: task analysis as a model for the development of a procedural checklist. Surg Endosc 23:1227–1232
Article PubMed Google Scholar
Ryason A, Petrusa ER, Kruger U, Xia Z, Wong VT, Jones DB, De S, Jones SB (2020) Development of an endotracheal intubation formative assessment tool. J Educ Perioper Med 22(1):E635
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge support from the UT Southwestern Simulation Center for the conduct of this study. The authors thank Dave Primm of the UT Southwestern Department of Surgery for help in editing this article.

Funding

This work was funded by a grant from NIH/NIBIB R01: EB025247.

Author information

Authors and Affiliations

Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390, USA
Alexis Desir, Emile Farah, Shruti R. Hegde, Carla Holcomb, Daniel J. Scott & Ganesh Sankaranarayanan
Artificial Intelligence and Medical Simulation Lab, Department of Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, 75390, USA
Alexis Desir, Carolina Marques, Shruti R. Hegde & Ganesh Sankaranarayanan

Authors

Alexis Desir
View author publications
You can also search for this author in PubMed Google Scholar
Carolina Marques
View author publications
You can also search for this author in PubMed Google Scholar
Emile Farah
View author publications
You can also search for this author in PubMed Google Scholar
Shruti R. Hegde
View author publications
You can also search for this author in PubMed Google Scholar
Carla Holcomb
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Scott
View author publications
You can also search for this author in PubMed Google Scholar
Ganesh Sankaranarayanan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ganesh Sankaranarayanan.

Ethics declarations

Disclosures

Alexis Desir, Carolina Marques, Emile Farah, Shruti Hegde, Carla Holcomb, Daniel J. Scott, and Ganesh Sankaranarayanan have nothing to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Podium presentation at SAGES 2023, Montreal, Canada.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 33 kb)

Supplementary file2 (DOCX 58 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Desir, A., Marques, C., Farah, E. et al. Validity and reliability evidence support task-specific metrics for laparoscopic fundoplication. Surg Endosc 38, 2219–2230 (2024). https://doi.org/10.1007/s00464-024-10675-4

Download citation

Received: 12 September 2023
Accepted: 30 December 2023
Published: 21 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00464-024-10675-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Validity and reliability evidence support task-specific metrics for laparoscopic fundoplication

Abstract

Background

Methods

Results

Conclusion

Similar content being viewed by others

Needs analysis for the development of a virtual reality laparoscopic hiatal hernia repair simulator

Training and assessment using the LapSim laparoscopic simulator: a scoping review of validity evidence

Subjective vs. objective assessment of simulation performance on laparoscopic cholecystectomy: are we evaluating the right things?

Materials and methods

Development of task-specific metrics for fundoplication

Validity evidence evaluation for the fundoplication task-specific metrics

Fundoplication simulator design

Study design and procedure

Data analysis

Sample size

Results

Phase I results

Task analysis

Phase II results

Pre-survey results

Demographics

Prior experience

Post-simulation survey results

Reliability analysis

Analysis of metrics

Global metrics

Task-specific metrics

In situ metrics

Discussion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Disclosures

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 33 kb)

Supplementary file2 (DOCX 58 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation