Significant health care disparities persist in access and training of safe surgical care [1]. Appendicitis is one of the most common surgical diseases worldwide, accounting for 17.7 million cases and 1.50 million global disability-adjusted life years annually (DALY) [2]. While laparoscopic appendectomy has become the standard of care in high-income countries due to its benefits of reduced surgical site infections, shorter recovery time and return to function for patients, and reduced postoperative pain, this operation remains inaccessible for patients in many low- and middle-income countries (LMICs) [3]. Several challenges exist to the uptake of widespread laparoscopy in LMICs including a lack of resources, insufficient finances, limited opportunities to practice, and conflicting stakeholder priorities [4]. However, a lack of training opportunities, experienced instructors, and accessible curricula in laparoscopy for LMIC surgeons may be the most pressing challenges [5].

Despite the well-known need to train LMIC surgeons in laparoscopy, this gap has gone largely unaddressed. Traditional models for training have focused on one-to-one partnership in which high-income country (HIC) institutions offer personnel training and equipment to singular LMIC partners [6,7,8]. While these efforts do focus on training, they lack clear pathways for scalability and sustainability and can reinforce unhelpful power dynamics by doing little to empower local trainees. Experts have recommended leveraging low-cost laparoscopy training simulators and telemedicine platforms to provide more accessible training options for LMIC surgeons [9]. Despite these calls, few innovative programs have been developed for low-cost training models to teach and perform laparoscopy in remote, simulation-based environments [10]. Those that are developed rarely assess the validity of evaluation measures to legitimately incorporate them as part of a scalable and adaptable surgical training curriculum. To comprehensively address the shortage of laparoscopically trained LMIC surgeons, innovative, low-cost, and scalable training modules must be developed and their associated assessment measures’ validity evidence evaluated.

ALL-SAFE is an initiative between the Pan-African Academy of Christian Surgeons (PAACS) and institutions in the USA aimed to address this gap. Since 2021, ALL-SAFE has developed free, open-source, virtual modules with an associated user-built, low-cost simulation system to teach and evaluate different laparoscopic skills in the LMIC setting. A pilot study assessing the ALL-SAFE module for ectopic pregnancy supported the use of the ALL-SAFE simulator and assessment tool to evaluate laparoscopic salpingostomy skills and demonstrated increased knowledge regarding ectopic pregnancy management among trainees [11]. Building from previous ALL-SAFE successes, we developed a novel ALL-SAFE training module and assessment tool to support independent laparoscopic appendectomy practice and skills development. In this pilot study, we evaluated the targeted evidence supporting the performance measures of the novel appendectomy verification of proficiency assessment tool (APPY-VOP), designed to measure laparoscopic appendectomy psychomotor skills. Specifically, we evaluated (i) discrimination between three performance levels (novice, intermediate, and expert), (ii) the correlation between scores among the three APPY-VOP components, and (iii) potential rating differences across the three rater groups.

Materials and methods

Design of the appendicitis simulator

The user-assembled ALL-SAFE box trainer and appendicitis task trainer were designed and constructed using materials readily available in LMICs and costing less than 10 USD (Supplementary files 1 and 2). A video-capable cell phone, laptop computer, and Wi-Fi or Bluetooth connection were recommended for assessment and full module participation. Laparoscopic instruments used in the simulation included a blunt grasper, curved tapered (Maryland) grasper, scissors, needle driver, 2–0 Silk suture (18–26 mm) with a taper needle (0.5 inch), and suture loops or optional pre-tied ligating loops (endoloop; Endoloop®, Ethicon, Raritan, NJ). The operation included identifying the anatomy of the appendix and surrounding structures, mobilizing the appendix with blunt dissection, ligating the appendiceal artery with placement of a figure of eight suture, tying of an intracorporeal knot with a surgeon’s knot, removing the remainder of the mesoappendix, placing two suture or pre-tied ligating loops at the base of the appendix, and transecting and removing the appendix from the laparoscopic box trainer.

Participants

This pilot study was conducted from March to August 2022 at three training hospitals in Sub-Saharan Africa and the USA: Mbingo Baptist Hospital in Cameroon, Soddo Christian Hospital in Ethiopia, and University of Michigan Hospital in the USA. The sites in Sub-Saharan Africa were PAACS training sites. This study received IRB exemption from the University of Michigan’s Institutional Review Board (HUM00199557). Expert laparoscopic surgeons, residents of varying skill levels, and novice medical students were recruited from each study site. All laparoscopic surgeons were rated as expert based on number of laparoscopic operations performed within the last month and over their lifetime. To differentiate skill levels among residents, residency program directors rated their trainees as novice or intermediate based on previous experiences with laparoscopy and level of general surgery residency training. All medical students were considered novice based on no prior experience with laparoscopy.

All participants completed the ALL-SAFE online educational module covering appendicitis management and laparoscopic appendectomy psychomotor skills. After viewing an expert laparoscopic appendectomy demonstration video in the ALL-SAFE simulation system using our low-cost appendix model, participants recorded their own performance within the ALL-SAFE box trainer. Participants were permitted to practice as many times as desired between viewing the expert video and recording their own. Following recording, participants were asked to self-rate their own video and to rate three peer videos uploaded at random by other ALL-SAFE participants across the various training sites. This provided a total of four ratings per uploaded video (oneself, three peer). Participants used the ALL-SAFE APPY-VOP to complete this rating.

Design of the verification of proficiency (APPY-VOP) performance assessment tool

The APPY-VOP was designed through expert consensus following review of the Objective Structured Assessment of Technical Skills (OSATS) [12] and the American College of Surgeons (ACS) and Association of Program Directors in Surgery (APDS) online curriculum [13]. A first version APPY-VOP was drafted by one co-investigator with extensive laparoscopic surgery experience (MB) and reviewed by the entire research team for content and relevance, including four general surgeons and five learners across the three study sites. The reviewed version was further edited by the Principal and Co-investigator to split one item, add three additional “error-based items,” and split the final overall rating designation to “Competent, Borderline, and Not Competent.” Final review was conducted by a psychometrician (DR) for clarity, relevance, and alignment of questions with psychomotor skills.

The APPY-VOP final version had three components: a 13-item ALL-SAFE psychomotor skills checklist of key psychomotor skills (Checklist), a 5-item modified OSATS (m-OSATS), and 1 final overall competency rating (Final rating) (Supplementary file 3). The 13-item ALL-SAFE skills checklist was designed to assess competency in the critical steps of performing laparoscopic appendectomy, including critical errors. Checklist items 1–3, 5–7, 9–11, and 13 were scored up to 2, while items 4, 8, and 12 were scored up to 3 to differentiate the importance of critical errors most relevant to patient safety, for a possible total of 29 points (Summed). The m-OSATS was a shortened version of the original 6-item OSATS, a tool validated for assessing trainees’ surgical skills across a variety of settings [12]. The m-OSATS was used to measure competency across 5 core laparoscopic skills via 5-point scales, with domains that include “Respect for tissue” and “Instrument handling,” with a possible total of 25 points (Global Summed). The maximum combined sum of the ALL-SAFE skills checklist and m-OSATS was 54 points (Total Summed). Finally, the overall competency (Final rating) assessed overall measures of competency and was scored using a three-point scale (1 = “Not competent,” 2 = “Borderline,” 3 = “Competent”).

Data analysis

Once data was confirmed to be non-parametric, the capacity of the three components of the APPY-VOP to differentiate between novice, intermediate, and expert performance levels was evaluated using Kruskal–Wallis test and substantiated with secondary analysis via a Many-Facet Rasch Model (8 facets; ID × Operator Expertise × Operator Continent × Judge Expertise × Judge Continent × Judge/Evaluator × Final Rating × Item). Correlation of participants’ ALL-SAFE checklist summed scores, m-OSATS scores, and overall competency (Final) scores were estimated by Pearson’s r. Inter-rater agreement of novice and expert raters was determined using averaged two-way mixed intraclass correlation, ICC(A,k), across 10 randomly selected performances judged by 11 novice and 9 experienced raters.

Rating differences across expertise levels, continent, and site, that would indicate potential rater bias, were calculated using the same Many-Facet Rasch Model. Statistical analyses were performed using SPSS Statistics for Windows v.25 (IBM, Armonk, NY) and Facets software v. 3.50 (Winsteps.com, Beaverton, OR), with P-values of less than 0.05 considered statistically significant.

Results

Demographics

Twenty participants across three pilot sites participated in the study (Table 1). Participants included expert laparoscopic surgeon (n = 1), general surgery residents (n = 11), and medical students (n = 8). The final number of expert (n = 1), intermediate (n = 8), and novice (n = 11) classifications reflected participants’ training level and experience with laparoscopy specifically.

Table 1 Participant demographics for ALL-SAFE appendectomy APPY-VOP Pilot

Discrimination between performance levels using APPY-VOP

ALL-SAFE checklist

For the checklist items, scores increased with experience level, with exception for item 12 [Avoids leaving residual appendix on cecum (< 3 mm)]. Despite this positive trend, item-level differences were not statistically significant across the three groups (Table 2). Item-level Rasch analysis supported this positive, but nonsignificant, trend: novice (M = 1.6), intermediate (M = 1.9), and expert (M = 2.2), P = 0.44. The Checklist summed scores increased from novice (M = 21.02) to intermediate (M = 23.64) and expert (M = 28.25) performers, with statistically significant discrimination between novice and expert performances (P = 0.005).

Table 2 Comparison of ratings across performance levels using the ALL-SAFE psychomotor skills checklist

For the m-OSATS, the Kruskal–Wallis test indicated the five domains were able to discriminate across novice, intermediate, and expert performances (Table 3). These findings were supported by secondary Rasch analyses. The m-OSATS global summed and total summed scores (m-OSTATS global summed + checklist summed) were also able to discriminate across these three levels of performance (P < 0.001). The m-OSATS final rating also adequately differentiated performance levels: Competent (M = 3.8), Borderline (M = 2.7), and Not Competent (M = 1.8), Χ2 (85) = 243.3, P = 0.001. The Many-Facet Rasch Model supported these findings, with statistically significant ratings across performance levels, including Competent (M = 2.0), Borderline (M = 1.8), and Not Competent (M = 1.4), Χ2 (85) = 32.3, P = 0.001.

Table 3 Comparison of mean ratings across performance levels using the modified Objective Structured Assessment of Technical Skills (m-OSATS)

Correlation between ALL-SAFE checklist and m-OSATS

Testing correlation of all participants’ ALL-SAFE checklist summed scores with m-OSATS summed scores indicated a positive significant relationship, r(83) = 0.63, P < 0.001 (Fig. 1). Similarly, the ALL-SAFE checklist summed score correlated with the combination of the ALL-SAFE checklist summed score and m-OSATS summed score, r(83) = 0.92, P < 0.001. ALL-SAFE checklist summed scores also correlated with the overall (final) rating scored on a three-point scale, r(83) = 0.58, P < 0.001.

Fig. 1
figure 1

Correlation between APPY-VOP checklist and m-OSATS summed scores

Inter-rater agreement

Inter-rater agreement of m-OSATS overall performance ratings suggested mixed rater agreement across novice and experienced judges, ranging from poor to moderate for m-OSATS domains (Table 4). Poorest inter-rater agreement was estimated for two domains: Economy of Time and Motion (ICC = 0.45) and Flow of Operation (ICC = 0.45). Higher scores were present for Respect for Tissue (ICC = 0.70), indicating moderate agreement and for Total Summed scores from both the m-OSATS and checklist scores (ICC = 0.83), indicating good consistency of responses across participants. Rasch analysis suggested no rating differences or biases across expertise levels, continent, or site, P ≥ 0.66.

Table 4 Inter-rater reliability of the APPY-VOP across novice and expert raters

Discussion

This study used a novel learning and performance assessment tool: the ALL-SAFE appendectomy skills verification of proficiency tool (APPY-VOP) as way to measure skills required for laparoscopic appendectomy among a range of learners in three locations across two continents. Our findings indicate the APPY-VOP can discriminate performance levels regardless of rater experience, especially when all three components and the m-OSATS summed scores are considered. Therefore, this study supports the use of APPY-VOP for performance assessment for the ALL-SAFE appendicitis module among trainees with a variety of experiences.

While individual item levels were unable to discriminate across performance levels, when summed and considered as a whole, the 13-item checklist was able to differentiate across performance levels. The m-OSATS, previously validated to assess surgical performance during simulated ectopic pregnancy, also was able to discriminate across three performance levels when used to measure laparoscopic appendectomy skills performance in the same setting [11]. Additionally, the significant positive correlation between ALL-SAFE checklist summed scores and m-OSATS summed scores and between the ALL-SAFE checklist and ALL-SAFE checklist combined with m-OSATS strongly supports the use of ALL-SAFE checklist to measure laparoscopic skills in this simulated surgical setting. Finally, participants were able to use the overall Final rating to effectively discriminate across three levels of ability, indicating that this singular measurement of competency as part of the APPY-VOP has significant power to separate users’ skill levels.

Furthermore, indistinguishable rating differences via Rasch analysis across novice and expert raters for total summed scores suggests an expert opinion may not be a requirement when evaluating ALL-SAFE users. In practice, this could lessen burdens on surgical faculty who strive to supplement operative training with simulation-based training. Additionally, there is added benefit to engaging trainees in the peer review process, as doing so has been shown to increase individual skills and operational efficiency [14, 15] and may allow reviewers to practice skills for future teaching and mentorship [16]. Similarly, video-based coaching platforms have been shown to improve surgical skills among residents of varying skill levels in other settings [17].

Most importantly, ALL-SAFE addresses one of the key barriers preventing uptake of laparoscopy in LMICs: a lack of accessible training programs and validated assessment tools. Although some LMICs have been able to acquire basic laparoscopic surgery equipment, there is a continued need for proper training such that surgeons in these regions may learn the skills required for laparoscopy [4, 5, 9, 18]. Studies have demonstrated that computer-based, self-directed, and incremental video-based training is effective for teaching surgical skills to learners of all levels [19]. However, these resources are not readily available or co-designed for learners in LMICs. Our system addresses this gap by supporting training in basic laparoscopic skills using readily available, inexpensive materials in LMICs. An even greater disparity in laparoscopic surgical access in LMICs exists in training for complex operations beyond the three most common of appendectomy, cholecystectomy, and laparoscopy [20]. Although this study was designed for skills suited for laparoscopic appendectomy, many of the surgical skills including intracorporeal knot tying, blunt dissection, tissue manipulation, and pre-tied ligating loop placement have applicability for many laparoscopic operations. While a pilot study, these findings of the APPY-VOP have important implications for laparoscopic education in LMICs. Future studies should focus on correlating trainee proficiency to clinical practice, incorporation of artificial intelligence and machine learning to evaluation metrics, and scaling ALL-SAFE to other LMIC settings.

Limitations

This study had several limitations to consider. First, it was conducted with a small group of participants among sites familiar with the ALL-SAFE platform. While appropriate for a pilot study designed to evaluate validity evidence of a novel skills assessment tool, this sampling limits generalizability of findings to other sites. Secondly, uneven distribution of participants, mainly that there was only one expert participant and that the University of Michigan cohort which consisted exclusively of “novice” users who were familiar with the ALL-SAFE system may have inadvertently introduced unexpected scoring patterns with a nested design and negatively impacted the inter-rater reliability estimates. In future studies, recruitment of all skill levels from each site should be conducted, with all submitted performance to be evaluated by multiple judges from other sites to ensure maximization of samples. Finally, potential bias from “experienced novices” will be minimized by recruiting new novice groups unfamiliar with the ALL-SAFE platform and including a wider group of true experts as a “gold standard” comparison group.

Conclusion

This pilot study provided evidence for the use of a novel assessment tool: the APPY-VOP for use in ALL-SAFE, our low-cost laparoscopy training simulator and online learning module. The tool was piloted across three teaching facilitates in two continents, among surgical learners of all skill levels. Our findings suggest that most components of the APPY-VOP effectively discriminated novice, intermediate, and expert performance in laparoscopic appendectomy skills and that rating alignment across novice and expert groups suggested consistent evaluation, independent of expertise. These results support the use of APPY-VOP among users of all skill levels alongside a peer rating system.