With the increasingly widespread performance of endoscopic surgery and the evidence accumulating as to the benefits of simulation training for the basic skills of surgery, the fundamentals of laparoscopic surgery (FLS) program has been established and introduced into the surgical residency curriculum in North America [1, 2]. However, a standardized training system has not been established for the advanced procedures of endoscopic surgery. Since the importance of performance assessment has been recognized as an essential element of surgical education, there has been more effort to develop and validate specific assessment tools for training in advanced endoscopic surgeries such as laparoscopic inguinal hernia repair [3, 4], laparoscopic colectomy [5], and laparoscopic gastric bypass surgery [6]. Although morbidity has decreased, gastric cancer is still one of the five most common causes of cancer death in the world [7], and more than half of the world’s total cases of gastric cancer occur in Eastern Asia [8]. While laparoscopic techniques have gained widespread acceptance in gastrointestinal surgery, there has been a delay in introducing laparoscopic gastrectomy for cancer treatment to the clinical environment compared to other gastrointestinal laparoscopic procedures such as laparoscopic bariatric surgery and laparoscopic colorectal surgery. One of the reasons is the complex nature of the procedure, which consists of various steps including lymph node (LN) dissection, resection of the stomach, and reconstruction. The complex nature of the procedure may make it difficult to learn and teach. Since a performance assessment scale for laparoscopic gastrectomy has not previously been reported, we conceived of the necessity of performance measures as educational tools for laparoscopic gastrectomy. The purpose of this study was to evaluate the essential steps in laparoscopic distal gastrectomy (LDG) through the process of cognitive task analysis (CTA) and expert consensus using the Delphi method, and develop an operative rating scale measuring the performance of LDG for the education of this procedure.

Methods

CTA and the Delphi method were adopted in this study. CTA was conducted in 3 steps comprising: (1) document analysis, (2) video analysis and observations, and (3) semi-structured interviews to extract the essential steps in LDG with D1 + LN dissection based on the Japanese gastric cancer treatment guidelines 2014 (ver. 4) [9]. In the next step, the consensus of the experts’ opinions was achieved through the Delphi survey in order to decide the subtasks, which would be used for creating the skills assessment tool for LDG. Informed consents were obtained from all participants of the Delphi survey.

CTA

The LDG subtask draft was extracted using CTA by 3 steps as follows:

  1. (1)

    Document analysis: The essential steps of LDG were determined with reference to the surgical textbooks and published literature. An in-depth analysis was prepared with preliminary ideas.

  2. (2)

    Video analysis and observation of LDG: The experts’ LDG videos were analyzed and the LDG procedures were observed by researchers in the operating room (OR) to create the original draft of subtasks.

  3. (3)

    Semi-structured interviews: The investigator interviewed LDG experts and a surgical resident who was a novice at LDG. They were asked about the suitability of the original draft of the subtasks and the instructions for completing each task. The resident suggested several points, which experts might be apt to overlook.

Delphi survey

  1. (1)

    LDG experts’ selection: The investigator set criteria for the participant experts as follows: (A) should have over 100 laparoscopic gastrectomy cases of operation, (B) be qualified as masters of endoscopic surgery by the Endoscopic Surgical Skill Qualification System (ESSQS) in Japan, (C) should have publication in the field of “stomach” or “gastric cancer,” (D) be engaged in the education for surgical resident, then contacted 35 experts in laparoscopic gastrectomy to participate in the study via e-mail. The ESSQS was established under the management of the JSES in 2004 [10]. It has been recognized as one of the most rigorous examinations for Japanese endoscopic surgeons, with a 25% pass rate in the category of gastric surgery in 2017.

  2. (2)

    Delphi 1st round: Links to an anonymous online questionnaire were sent to the participants by e-mail. A list of LDG subtasks, identified through the process of the CTA, were provided in the questionnaire. The participants were asked to rate the necessary LDG subtasks that they felt would provide adequate knowledge to perform the total LDG procedure using the Likert scale ratings from 1 to 5 (1: strongly disagree, 2: disagree, 3: undecided, 4: agree, 5: strongly agree) with the guidance of following question: “Do you think the item should be included in an assessment tool to be used for the education of novices learning the LDG procedure?” The participants were also asked to make a comment about subtasks and to recommend new subtasks they felt might be necessary. We set the questionnaire response time at 30 days, and sent one e-mail reminder to the participants 2 weeks before the deadline.

  3. (3)

    Delphi 2nd round: After the 1st round survey, some of the subtasks were revised or added based on the respondents’ comments. These comments and the results of the 1st round subtask ratings were made known to the respondents, and they were asked to rate subtasks again in a 2nd round survey. We set the same time frame for reminders and collecting answers as in the 1st round.

  4. (4)

    Extra survey: After we achieved a consensus from the Delphi survey, we asked the participants to answer one more question from the viewpoint of safety or oncological curability regarding the selected subtasks. The extra question was as follows: “Which is the especially important subtask for the evaluation and training in terms of (a) safety or (b) oncological curability for LDG D1 + LN dissection and reconstruction?”

Statistical data analysis

Means and 95% confidence intervals (CI) were calculated for all the subtasks. Cronbach’s alpha was calculated for internal consistency among the experts. JMP Pro version 12.2 (SAS Institute Inc. NC, USA) was used for the statistical analysis.

Consensus decision

We determined that the Delphi survey should be conducted in at least 2 rounds. This allowed us to reflect on the results and the respondents’ 1st round comments for reassessments in the 2nd and any subsequent rounds. The consensus was predefined using Cronbach’s alpha > 0.8 according to a global Delphi consensus study on defining and measuring quality in surgical training [11]. The subtasks were adopted when they were rated 4 or 5 on the Likert scale by 80% or more of the experts.

Results

A draft of the key LDG subtasks was created using the documents, video analysis, and direct observations of 20 LDG cases at 3 teaching hospitals. After the semi-structured interviews with 2 LDG experts and a surgical resident in the final process of the CTA, 35 essential steps for LDG were extracted for evaluation in the Delphi survey. Thirty-one LDG experts agreed with our invitation and voluntarily participated in the Delphi survey with the consent form. Twenty-eight of 31 LDG experts completed the 1st round survey (Response rate: 90.3%). The participating experts had performed 300 (median) cases of laparoscopic gastrectomy and published 15 (median) peer review articles related to the “stomach” or “gastric cancer” (Table 1). Of the 28 participants who completed the 1st round survey, 27 also completed the 2nd round questionnaire (Response rate: 96.4%), and 26 of 27 responded to the extra survey questionnaire (Response rate: 96.3%). After the Delphi consensus was achieved with the results of the 2nd round survey using Cronbach’s alpha 0.86, the subtasks were adopted if they were rated 4 or 5 by 80% of respondents in the 2nd round survey (Table 2). The Japanese operative rating scale for laparoscopic distal gastrectomy (JORS-LDG) was finally created based on the selected subtasks resulting from the Delphi 1st and 2nd round surveys (Table 3). In the extra survey, the important subtasks from the viewpoints of safety or oncological curability for LDG D1 + LN dissection were selected by an 80% or greater consensus of the LDG experts. (Table 4).

Table 1 Background of survey participants (n = 28)
Table 2 Results of Delphi 1st round (R1) and 2nd round (R2)
Table 3 JORS-LDG score sheet
Table 4 Important subtasks for “safety” selected by 80% or more consensus in extra Delphi survey

Discussion

We used a comprehensive method including the CTA and Delphi methods to determine the essential subtasks of LDG. With those results, we developed the JORS-LDG for measuring the skill set for the education of safe and secure LDG procedure. In the process of the CTA, semi-structured interviews were conducted with the intention of developing an educational tool for beginners to be able to perform safe and secure LDG. Thirty-five key LDG subtasks were selected consisting of the following identification and technical factors: procedure setup, intra-abdominal check, LN dissection, stomach resection, reconstruction, and final check (Table 2). In the latter process, the Delphi method contributed to building a consensus to select the subtasks as the basis for development of the JORS-LDG. We had an excellent response rate to the Delphi method (1st round: 90.3%, 2nd round: 96.4%) and a high value of Cronbach’s alpha of 0.86.

The Delphi approach has been widely used in the medical field to collect formal consensuses [12, 13], and there have been even more articles using the Delphi approach in the surgical field to establish official statements such as practice guidelines and research agendas [14, 15]. In previous studies of surgical education, the Delphi method was introduced to develop assessment scales for measuring surgical skills [16,17,18]. Pucher et al. [16] identified the key domains of laparoscopic cholecystectomy. They adopted both technical and non-technical domains to create a road map which would contribute to patient safety in laparoscopic cholecystectomy. Palter et al. [17] and Dijkstra et al. [18] established the key steps in laparoscopic colorectal procedure. Palter et al. selected the procedural steps as well as the details as to the preoperative patient evaluation and the operating teams needed for training curricula. Dijkstra et al. developed the rating scale describing precise subtasks for evaluating surgeons’ technical performances in laparoscopic colectomy. Compared to laparoscopic cholecystectomy and colectomy, laparoscopic gastrectomy to treat gastric cancer requires additional steps comprising LN dissection and reconstruction of complicated anatomy. Therefore, we aimed to create a skills assessment scale for practicing surgeons who would begin to learn and develop safe and secure LDG skills. Since LDG is the most frequent procedure performed in the surgical management of gastric cancer, and its level of difficulty should be less than that of proximal or total laparoscopic gastrectomy, we determined that the development of the LDG assessment scale would be suitable for beginners to start learning this procedure.

The JORS-LDG will be used for the evaluation of LDG along with the steps involved in D1 + LN dissection. The surgical treatment of early gastric cancer includes D1 + LN dissection in the Japanese treatment guidelines [18]. Since D1 + LN dissection is less difficult than D2 LN dissection, which would be performed in advanced gastric cancer, the JORS-LDG evaluations are needed to assess these skills as well.

According to the results of the 1st round Delphi survey, the new subtasks of “Resection of right gastroepiploic artery (RGEA) and infrapyloric artery (IPA)” and “Ensure good visualization of upper margin of pancreas by gentle retraction of pancreas” were added. These subtasks were rated 4.56–4.92 and 4.66–4.97, respectively, with a 95% CI in the 2nd round survey. This result reflected the LDG experts’ recognition of the importance of a precise approach to subpyloric lymph node dissection and careful consideration to prevent injury of pancreas.

On the other hand, the following 4 subtasks were rejected after the 2nd round survey: “Checking the residual quantity of carbon dioxide gas,” “collect irrigation fluid at Douglas pouch for cytodiagnosis,” “checking the localization of lesion or ink marking from serosa side,” and “detection of LN swelling around the stomach.” Because checking the carbon dioxide gas seemed to be a responsibility of the operator as well as the other OR staff. Additionally, collecting irrigation fluid for cytodiagnosis was not always necessary during surgery for early-stage gastric cancer. The reason for deleting the subtasks associated with checking the lesion and LN swelling was that localizing the lesion and checking every LN swelling were not always possible during surgery.

A limitation of this study is that the JORS-LDG has not been validated for its educational value through an examination of its reliability and validity. As an extension to this study, we will investigate the inter-rater reliability in comparing the several raters’ evaluation of the JORS-LDG scoring method. We will also evaluate the correlation between the JORS-LDG scores and the LDG surgeons’ various practice backgrounds, particularly with respect to their experience and case logs, and their training reports in laparoscopic surgery.

Recently, intraoperative skill measures have been demonstrated to correlate with postoperative patient outcomes [19, 20]. Birkmeyer and colleagues [19] demonstrated that with respect to the performances of 20 bariatric surgeons, the lower scoring group had significantly higher numbers of postoperative complications and operative mortality upon review of their laparoscopic gastric bypass videos by 10 blinded raters. Machenzie et al. [20] compared the intraoperative technical skills scores and postoperative patient outcomes for 171 cases performed by 85 surgeons in laparoscopic colorectal surgery. They demonstrated that there has been a significant difference in the rate of postoperative complications and the number of retrieved lymph nodes using blind video evaluations. The intraoperative skills evaluation with appropriate evaluation measures may have the potential to predict postoperative surgical outcomes. Furthermore, if the results of the evaluation of each subtask can be used to analyze complications, the procedure-specific assessment scale may play an important role in the improvement of postoperative outcomes. Regarding outcome prediction, we inquired about each subtask in the Delphi extra survey that addressed the concept of “safety” in this study. From the scores for these subtasks, it may be possible to analyze surgical complication factors associated with safety.

While Scally et al. [21] investigated the correlation between the video rating scores with respect to surgical skill and the long-term outcomes after bariatric surgery, there has been no study that reported on the relationship between procedure skill rating scores and long-term postoperative oncological outcomes. Future investigations that compare the score focusing on each aspect of the LDG could identify the correlations between the results of the skill evaluation and oncological outcome with respect to disease recurrence. Moreover, the subtasks—considered to be important from the viewpoint of “oncological curability” in the extra round of Delphi survey comprising LN dissections of the greater curvature, the subpyloric region, the upper margin of the pancreas, and the lesser curvature, and stomach resection with adequate margins (Table 5)—will be useful for detailed investigations of the relationship between these subtask scores and long-term outcome.

Table 5 Important subtasks for “oncological curability” selected by 80% or more consensus in extra Delphi survey

Conclusions

We developed the JORS-LDG using CTA and the Delphi method as an assessment and training tool for the education of LDG.