Laparoscopic surgery has spread quickly since the first report of a laparoscopic cholecystectomy in 1987 [1], and is now one of the standard approaches to various types of surgery. Clinically, the curability of laparoscopic surgery must be same as in open surgery, and safety and scar cosmetics are also important. The establishment of training and education systems for laparoscopic surgery is necessary, because it will help surgeons master the new skills in a safe and effective way [2]. The Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) and the European Association of Endoscopic Surgery (EAES) established minimum requirements for those performing laparoscopic surgery [3, 4]. Research concerning the assessment of endoscopic surgical skills has been developing rapidly because so many people are interested in performing endoscopic surgery.

The objective structured assessment of technical skills (OSATS) consists of six stations where participants perform tasks on live animal or bench models. Their performance is assessed using checklists and a global rating scale [5]. This method was the first reliable system for assessing surgical skills [6], but this method requires many people and a long time to perform. Therefore, some new assessment tools have been developed, and their construct validity has been demonstrated.

Surgical simulation is an effective tool for training and assessment. One of the key points in a training program is providing trainees with immediate feedback of their performance. Simulators can produce learning curves outside the operating theater in a pressure-free environment, without requiring formal supervision. Virtual reality simulators (MIST-VR [7, 8], Lap Mentor [9, 10], Lap Sim [11, 12], ProMIS [13], etc.) and motion analysis systems (ICSAD [1416], Adept [17, 18], HUESAD, etc.) are now able to provide real-time scores in the form of evidence-based reports (path length, errors, overall score, time, etc.). These methods usually require two phases: (a) training and (b) classification. During the training phase, representative data from each surgical level (e.g., novices, intermediates, experts) is used to establish the different class representative of the competence level. In the classification phase, the data recovered for a new surgeon are compared to those classes, and an assignation to one of them is performed based on a likelihood probability.

We developed our own motion analysis system for endoscopic surgery, called the Hiroshima University Endoscopic Surgical Assessment Device (HUESAD) [19]. The HUESAD has reliable parameters, visual-spatial ability, smoothness and accuracy, and we showed construct validity of the HUESAD [1922]. Moreover, we found that there was strong correlation between the motion analysis in the HUESAD assessment and the OSATS checklist and the global rating score [23]. The HUESAD has been demonstrated to be reliable as a system for the objective assessment of endoscopic surgical skills.

However, the HUESAD is a motion analysis system in dry box trainer, which means that it is used outside the operating theater.

We therefore used a new video analysis software program (Dartfish Software) that enables the user to take measurements such as the angles, distances and timing directly on digital video recordings made during a procedure, and offers the potential to decrease the bias associated with subjective image assessments. The Dartfish Software program has previously been used to develop performance-enhancing sports video training applications and exclusive televised broadcast footage.

The aim of the present study was to verify whether the video analysis software program (Dartfish Software) could be used to assess surgeons’ endoscopic surgical skills.

Methods

Box trainer

A box trainer (Endowork Pro II; MC Medical, Tokyo, Japan) was used in this study. A television image showing the inside of the box was visualized through a CCD camera on a 16-in monitor (Sharp Electronics, Osaka, Japan). Laparoscopic instruments (KOH Macro Needle Holder, 45 cm; Karl Storz, Tuttlingen, Germany) and surgical thread (3-0 braided silk SH-1; Ethicon, Somerville, NJ) were used.

Suturing tasks using a box trainer

The suturing task was evaluated in this study. This task is suturing five times without knot tying in a box trainer. The time and the locus tracing of the needle holders on both sides were analyzed using the video analysis software program (Dartfish Software). This software program allowed the time and the locus tracing to be measured for each subject (see Fig. 1).

Fig. 1
figure 1

The video analysis software (Dartfish Software)

Study design

Six experts (who had performed more than 100 laparoscopic surgeries) and eleven novices (who had no experience performing laparoscopic surgery) were recruited for this study. The aim of the task was to simply add five sutures in dry box trainer. The time and the locus tracing of the needle holders on both sides were analyzed using the Dartfish Advanced Video Analysis Software Program (version 5.5, Dartfish, Fribourg, Switzerland). This video measurement software de-interlaces video files to 30 images per second (sampling rate ~0.033 s). A previous study demonstrated that using the Dartfish Advanced Video Analysis Software program is an efficient approach to improve the reliability of visual video assessments (the use of video analysis software increased the interpreter reliability of video gait assessments in children with cerebral palsy).

Statistical analyses

All data were processed and analyzed using the SPSS software package (ver. 19.0, SPSS Inc., Chicago, IL). The Wilcoxon signed-ranks test for related data was used to assess the differences in performance for all of the parameters measured for each group. The Mann–Whitney U-test was applied for comparisons among each of the training groups. The study used a linear discriminant analysis (LDA), a classification method that can automatically determine the threshold level for classifying experts or novices according to the time spent and the locus tracing of the needle holders on both sides. The performance of the classification methods was examined using a cross-validation. Statistical significance was defined as a p value <0.05.

Results

The results indicated that there was a statistically significant difference between the experts and novices in all three variables examined (task time: p = 0.0011, the locus tracing of the left sides’ needle holders: p = 0.0011, the locus tracing of the right sides’ needle holders: p = 0.0011) (Figs. 2, 3).

Fig. 2
figure 2

Total execution time to perform the task

Fig. 3
figure 3

Total path length (right hand and left hand) between the expert and novice group

The best LDA results were obtained for the combination of the three parameters together (the time and the locus tracing of the needle holders on both sides). Table 1 shows the results of this classification in a confusion matrix. In this matrix, the actual (ground truth) classifications are in the rows, and the classifications predicted by the LDA are in the columns. Our classification methods could therefore correctly classify 100 % of the experts and novices.

Table 1 Our classification methods can correctly classify 80 % of experts in the two groups (cross-validation test)

Discussion

The results of this study demonstrate that the time and the locus tracing to complete a task were well correlated with the operator’s skill level. The “time to complete a task” has traditionally been used to assess technical performance [24]. However, faster performance is not always associated with better quality and improved outcomes. Locus tracing may better reflect the motor accuracy.

New tools have emerged over the last few years for the objective assessment of technical performance [20, 25]. A motion analysis allows for an assessment of the surgical dexterity using parameters that are extracted from the movement of the hands or laparoscopic instruments. An objective assessment of laparoscopic skill can be carried out using a motion analysis if endpoints for each parameter are quantified according to pre-defined levels of experience. The conversion of motion analysis data into competency-based scores or indices could provide a valuable source for trainee feedback. Such feedback can be useful, because it provides a quantitative index to define varying levels of experience, which trainees can work toward [26].

The HUESAD, our previously developed system which can precisely trace the movement of the tip of a laparoscopic instrument, was developed for the objective assessment of technical performance. Previous studies demonstrated the construct validity and reliability of the HUESAD [19, 2123, 27]. The HUESAD can define three different parameters: the integrated deviation (the visual-spatial ability) [19], the peak velocity (smoothness) [21] and the approach time (accuracy) [27]. By these three parameters, the HUESAD can provide feedback not only on the skill level of the trainee, but also provide data about the weak points. The HUESAD can be used to assess basic training as the subjects utilize simple orientation and movement skills in a non-anatomical environment. We have evaluated the correlation between the HUESAD and OSATS scores in a concurrent validity study [23]. However, it remains unknown whether the HUESAD can distinguish between different levels of performance in the operating theater.

The purpose of this study was to verify whether a video analysis software program (the Dartfish Software) could assess surgeons’ endoscopic surgical skills in a real operation. The Dartfish Software is not only used in sports science [6, 25, 28], but also in situations associated with medical support, such as providing instant visual feedback in treatment sessions, which allows patients to immediately see exactly what their body is doing.

On the other hand, the Dartfish software program can also assess the total endoscopic surgical skills using a simple method, which may be convenient to evaluate a surgeon’s skills in the clinical setting. In terms of training, the Dartfish Software program can assess the degree of skills attainment, and the HUESAD can assess the surgeon’s skills in detail, and can provide information such as “what are the surgeon’s weaknesses?”

In this study, we assessed a simple task: suturing in dry box trainer. This study showed the system’s construct validity, and the Dartfish Software could differentiate between levels of performance, but it cannot be used for formative assessment. In this study, locus tracing was calibrated using conventional two-dimensional (2D) data, so it was not the actual measured value, indicating that this data had limited accuracy.

Conclusion

The results of this study demonstrated that the results of the motion analysis using the Dartfish Software program were well correlated with a surgeon’s skill level. The construct validity was supported by the results of this study. In addition, the Dartfish Software program is simple to use, but has lower accuracy than real locus tracing. Further development of the system will be necessary to provide more relevant data, including work in three-dimensions (3D) and in the operating theater.