Simulation has been recognized as a method for teaching minimally invasive surgical (MIS) techniques to trainees that allows repetition and errors in inanimate systems before real patient contact. Simulators as instruments to measure performance of trainees must be validated.

The McGill Inanimate System for Training and Evaluation of Laparoscopic Skills (MISTELS) is the current standard by which laparoscopic skills are evaluated. It was specifically developed for evaluation of laparoscopic skills [13]. The MISTELS instrument, after initial studies confirming its validity [3, 4], became the cornerstone of the skills component of the Fundamentals of Laparoscopic Surgery (FLS) program adopted by the Society of American Gastrointestinal and Endoscopic Surgeons (SAGES) and the American College of Surgeons (ACS).

Another means for assessment of laparoscopic skills is by the Imperial College Surgical Assessment Device (ICSAD). This motion analysis system has undergone extensive validation studies proving it to be very sensitive in discriminating surgeons according to their experience [510]. The ICSAD consists of an electromagnetic field generator that affects the signals detected by sensors placed on surgeons’ hands so that their pose can be tracked in real time. The positional data obtained by the sensors can be converted to data reflecting the surgeon’s dexterity.

The ICSAD is a useful adjunct to training because trainees can be expected to achieve a certain level of proficiency as they progress through training. They also can be provided with an objective feedback of their performance [8, 9].

Despite the ICSAD findings that movement and path length are inversely related to performance, one study has shown that no fixed relationship exists between the number of movements and the time needed to perform a surgical task. In other words, when surgeons attempted to complete tasks as quickly as possible, they produced more movements per minute than when they took their time and used maximum precision to do the same task [11].

Design of a novel sensorized instrument

The surgical simulation systems currently available focus on metrics related to time efficiency and do not measure the forces exerted on tissue or the movements at the tip of a laparoscopic instrument. For this reason, novel instruments capable of detecting force and position at their tip were developed [12]. Specifically, these instruments were designed to sense forces in five degrees of freedom at their tip including the forces in three dimensions: the torque about the instrument axis and the gripping or cutting force, which depends on the type of tip attached to the instrument (Fig. 1). If different tasks need to be performed, the instruments were designed to permit interchangeable tips and handles as dictated by the task (Fig. 2). For example, all the FLS tasks that require conventional tips (grasping, cutting, suturing) can be performed with the novel instruments. They also are capable of sensing position in six degrees of freedom at the tip. Furthermore, the instruments were designed to meet the constraints of MIS because they fit through a standard MIS access port with a maximum outer shaft diameter of 10 mm.

Fig. 1
figure 1

Schema showing four of five degrees of freedom (DOFs) for forces available with the Sensorized Instrument-Based Minimally Invasive Surgery (SIMIS) system excluding actuation. A x and y axes, B z axis and torsion

Fig. 2
figure 2

Sensorized Instrument-Based Minimally Invasive Surgery (SIMIS) instrument prototype with examples of exchangeable handles and tips that can be attached to the instrument

The novel MIS instruments also were designed such that their appearance and weight are consistent with existing standards in MIS. In other words, the design was fashioned to ensure that the instruments mimicked actual MIS instruments used by surgeons in real operating theaters. Finally, a software interface was designed that allowed force and position data to be recorded while trainees perform a series of standardized tasks. The novel instrument and software package were named the Sensorized Instrument-Based Minimally Invasive Surgery (SIMIS) system. The design of the SIMIS instruments is presented by Trejos et al. [12, 13]. The face and content validities of SIMIS were ensured by including a surgeon specializing in MIS in the consultations for the design of the instrument.

The purpose of this study was to demonstrate the construct validity of a novel surgical instrument designed to sense forces and position at its tip as an educational tool in surgery. The specific aim was to demonstrate differences in both force and position measures between novices and experts.

Methods

Participants

To evaluate SIMIS, 20 volunteer participants including surgeons, surgical trainees, and graduate engineering students were recruited from a single educational institution. This experiment was approved by the University of Western Ontario’s Research Ethics Board, and all the participants read and signed an informed consent form before participation. Experiments were performed using an official FLS laparoscopic box trainer.

After recruitment, the participants viewed the SAGES FLS instructional video demonstrating task 5 (intracorporeal suturing) from MISTELS. Task 5 is recognized as the most challenging procedure [14] and thus the most likely to demonstrate differences. Scores for each participant were generated on the basis of their performance by an evaluator with expertise in the grading the MISTELS tasks. After the participants had viewed the instructional video, ICSAD hand sensors were taped to their hands to allow for tracking of path length and numbers of movements.

The participants came from four equal groups: MIS surgeons, senior general surgery residents, surgery interns, and engineering graduate students. Despite training for surgical interns and a lack of training for engineering graduate students, overlap in ability irrespective of discipline or MIS training was possible. Indeed, current MIS training still is very inconsistent. A final-year resident may have had little exposure to an advanced MIS teaching service, and experienced MIS surgeons may actually do little suturing in practice, making “experience level” less reliable than other objective means of assessing skill level [15]. Therefore, because the currently accepted standard is satisfactory completion of the FLS program [16], the FLS scores of the participants were used to stratify the subjects into two groups: novices and experts. To perform this stratification, the FLS score was used as a test to distinguish novices and experts by finding the optimal range under a receiver operating characteristic (ROC) curve.

Experimental evaluation

The suturing task was broken down into five subtasks (Table 1). Videos of each participant’s performance were viewed, and times were generated for each of the five components of the suturing task. The mean forces exerted over each period were extracted from the data using Matlab (The Mathworks, Natick, MA, USA). The mean forces for each subtask for all the participants in each of the groups were averaged. This allowed comparison of each group from a force perspective.

Table 1 Components of the suturing task

We hypothesized that expert surgeons would use less force than novice surgeons for FLS task 5. This hypothesis was based on the premise that they would use only the force necessary to complete the task successfully and that they would minimize wasted actions.

The ability of SIMIS to track the position of the instrument tip in six degrees of freedom also may be used as a surrogate marker for performance. In addition to generating charts demonstrating the impact of position and performance qualitatively, the data generated by SIMIS permit the calculation of a “work sphere” calculated from the volume of an ellipsoid depicting the 95% confidence intervals of each instrument’s position for each study participant during the entire task. As such, the volume of each ellipsoid can be calculated and compared.

We hypothesized that the work-sphere volumes of the experts would be smaller than those of the novices. This hypothesis was based on observations regarding shorter path length and fewer hand motions of experts than novices using the ICSAD. Because the ICSAD system is the current standard for measuring path length and numbers of movements in education for both open and laparoscopic surgery, it is ideally suited for validating the use of SIMIS position sensing to assess training and skills.

Statistical analysis

Data were summarized as means ± standard deviations. Similarly, when data were not normal, results were summarized as medians ± interquartile ranges. Data were assessed for normality using the D’Agostino & Pearson omnibus normality test. Normally distributed data between groups were analyzed using the one-tailed, unpaired t test, and non-normal data between groups were analyzed using the Mann–Whitney U test. Linear regression was used to understand the relationship between FLS scores and force performance for statistically significant findings. Normal data were analyzed using the Pearson correlation to compare FLS and SIMIS force scores. If data were not normal, Spearman correlation was used.

In all cases, p values less than 0.05 were considered statistically significant. Statistical analysis was performed using GraphPad Prism 5 (GraphPad Software Inc, La Jolla, CA, USA).

Results

Group assignment

The ROC curve for FLS task 5 was found to be an excellent test for proficiency, with an area of 0.9267. Using this approach, the threshold in the FLS score selected should determine who is least likely to be a true surgeon. Table 2 demonstrates the optimal ranges of FLS scoring in relation to sensitivity and specificity. Using this technique, we see that scores less than 155.5 are the least likely to be consistent with being a surgeon.

Table 2 Sensitivity and specificity of the Fundamentals of Laparoscopic Surgery (FLS) score

Thus, using the ROC curve technique, the study participants were categorized as novices if their FLS score was less than 155.5 (n = 8) and as experts if their FLS score was 155.5 or higher (n = 12). All MIS surgeons and senior general surgery residents achieved scores qualifying them as experts. All the engineering graduate students had scores that categorized them as novices. Two surgery interns had scores consistent with expert classification, whereas the remaining three interns were classified as novices.

Force sensing

Total force data were found to have a Gaussian distribution. Therefore, mean total forces were compared. There was a trend for the novices to exert a higher mean force over the course of the procedure than the experts (0.9 ± 0.3 N vs. 0.7 ± 0.3 N). However, this difference was not statistically significant.

Grasp force data were found to have a Gaussian distribution. Therefore, mean grasp forces were compared. The experts exerted a significantly higher mean grasp force over the course of the procedure than the novices (21.8 ± 6.9 N vs. 15.1 ± 7.2 N; p = 0.025).

Torsion force data were found to not have a Gaussian distribution. Therefore, median torques were compared. There were no differences between the novices and the experts in terms of median torsion force over the course of the procedure (0.03 N cm; range, 0.02–0.08 vs. 0.03 N cm; range, 0.02–0.05 N cm; p = 0.31).

The differences in force patterns exerted over the entire suturing task were predominantly explained by the differences in grasp forces between the novices and the experts. Linear regression was used to understand the relationship between grasp force and FLS score. There was a positive relationship between FLS score and grasp forces, with higher FLS scores significantly related to higher mean grasp forces (F = 6.525; p = 0.020). The regression is shown in Fig. 3. Grasp force data for all the subjects over the entire task were normal, so correlation was assessed using Pearson’s test. A significant correlation existed between the FLS scores and the grasp forces for the entire task (r = 0.5158, p = 0.020).

Fig. 3
figure 3

Fundamentals of Laparoscopic Surgery (FLS) correlation with grasp force

Data for the subtasks were analyzed in the same fashion. Total force data for each subtask were found to be non-Gaussian. Therefore, medians were compared. Significant differences were demonstrated for subtask 1 (preparing a needle) and subtask 2 (driving a needle), with the novices exerting more total force than the experts. Nonsignificant trends demonstrating greater force exertion by the novices for subtasks 3–5 also were demonstrated. These final three subtasks represent knot-tying tasks. When the results from these three subtasks were pooled, the novices demonstrated significantly greater forces than the experts.

Grasp force data for subtasks 1–3 were Gaussian, whereas grasp force data for subtasks 4 and 5 and for all knots were non-Gaussian. The results of subtask analysis for grasp force demonstrated nonsignificant trends, with the experts exerting greater force than the novices for subtasks 1–3. The experts also were found to exert significantly more force than the novices for subtasks 4 and 5. Considering all knots tied, the expert surgeons again used significantly more force than the novices.

Torsion force data for subtasks 1–3 were Gaussian, whereas torsion force data for subtasks 4 and 5 and for all knots were not Gaussian. The results of subtask analysis for torsion force demonstrated nonsignificant trends, with the novices exerting greater force than the experts for subtasks 1–3 and the experts exerting greater force for subtask 4. The experts also were found to exert significantly more force than the novices for subtask 5. Considering all knots tied, the expert surgeons used more force than the novices, but this difference was not significant.

Position sensing

Work-sphere volumes from the ellipsoids representing the 95% confidence intervals of the work spaces for the suturing task and the individual subtasks were extracted from Matlab. An example is shown in Fig. 4.

Fig. 4
figure 4

Work sphere for entire suturing task for a novice

Work-sphere data for the entire suturing task were Gaussian. Mean work spheres between groups were significantly different, with the experts using a greater work area than the novices (2.6 × 106 ± 1.6 × 106 vs. 1.0 × 106 ± 6 × 103 mm3; p = 0.0075).

Linear regression was used to understand the relationship between work-sphere volumes and FLS score. A positive relationship was found between FLS score and work-sphere volumes, with higher FLS scores significantly associated with larger mean work spheres (F = 6.673; p = 0.019). The regression is shown in Fig. 5. Work-sphere-volume data for all subjects over the entire task were non-Gaussian, so correlation was assessed using Spearman’s test. There was a significant correlation between the FLS score and the work volume for the entire task (r = 0.5815; p = 0.0072).

Fig. 5
figure 5

Linear regression of Fundamentals of Laparoscopic Surgery (FLS) score versus volume

Individual subtasks were evaluated in the same fashion. None of the subtask volume data sets were Gaussian, so medians were compared. In contrast to the overall task volumes, nonsignificant trends showing smaller work-space volumes for the individual subtasks were demonstrated for the experts (Table 3). When all the tasks were considered, the mean volumes were smaller for the experts than for the novices, but the difference was not statistically significant.

Table 3 Comparison of work volumes for each subtaska

Position and ICSAD

Using the ROVIMAS software (Imperial College, London), ICSAD data were extracted for all the subjects. Path lengths and numbers of movements by subjects in the novice and expert categories were assessed for normality. Both data sets were Gaussian. The novices had a significantly greater mean path length than the experts (47.8 ± 9.4 vs. 18.9 ± 6.0 m; p < 0.0001). Similarly, the novices used significantly more movements than the experts (148.8 ± 72.45 vs. 65.33 ± 28.43; p = 0.001).

Linear regression demonstrated that a significant inverse relationship existed between ICSAD path length and work-sphere volumes (F = 5.741; p = 0.018) (Fig. 6). Spearman correlation also confirmed a correlation between ICSAD path length and work-sphere volumes (r = −0.5895; p = 0.0062). Linear regression for numbers of movements also demonstrated an inverse relationship between ICSAD movement counts and work-sphere volumes, but this finding was not statistically significant (F = 2.394; p = 0.14). Spearman correlation was similarly nonsignificant (r = −0.2775; p = 0.24).

Fig. 6
figure 6

Imperial College Surgical Assessment Device (ICSAD) path length versus volume

Discussion

The evaluations detailed in this research show that SIMIS, a novel force- and position-sensing laparoscopic instrument, has construct validity as a teaching tool because it can detect differences in performance of an MIS task between experts and novices based on force and position data. Our hypotheses regarding the ability of SIMIS to distinguish experts and novices on the basis of force measurements were correct. Most of the differences in force exertion were explained by both the grasp force and the total force (force magnitude in x, y, and z directions summed). Interestingly, the knot-tying component accounted for much of the observed difference. This is not surprising because an increase in amplitude with the cinching down of knots is likely applied to ensure that the knot does not come unraveled.

Notably, whereas a significant increase in grasp force was demonstrated by the experts, they also demonstrated significantly lower total force while tying knots. This finding, although somewhat paradoxical, may suggest that the more important component of knot tying is the grasp force on the suture and that the three-dimensional forces are not as important. The tendency for the experts to exert lower total force may suggest proficiency and economy of force application gained through experience.

Regarding the torque readings for all of the tasks, essentially no difference in expertise was observed between groups over the whole suturing task or for the individual subtasks. This is likely due to the FLS suturing model, which is made from pliable latex Penrose surgical drains. This material offers almost no resistance. Perhaps different findings would have been seen if a more realistic model had been used.

With respect to the position data for the overall task, the null hypothesis was rejected. However, it was rejected for a different reason than expected. The hypothesis was that the experts would use a smaller work volume area. This prediction was based on research with other instrument-tracking systems demonstrating that expert surgeons make fewer movements and use shorter overall path lengths than novices when performing surgery [9, 11, 1719]. Interestingly, the expert surgeons used a larger overall work volume but made fewer hand movements, as measured using ICSAD, and also used an overall smaller path length. Moreover, although the expert’s work volume for the overall suturing task was larger than that of the novices, the volumes for the subtasks were smaller for the experts. This finding suggests that experts make better use of the operative field and efficiently perform tasks in different areas using a minimal amount of space, whereas novices may attempt to “work in a small hole” for the entire task.

Additionally, experts may accentuate their movements laparoscopically for safety purposes. This notion is supported in research performed by Chmarra et al. [20], who found that experts make accentuated goal-oriented movements in MIS. These authors also proposed that such goal-oriented approaches can be split into two phases: retracting and seeking. Novices are less effective than experts in the seeking phase, which is the portion of the operation that accounts for touching an object of interest or performing a surgical task. Therefore, the seeking phase is characteristic of performance differences. Furthermore, the retracting phase improves safety by avoiding intermediate tissue contact.

Perhaps the small expert subtask volumes observed in these experiments are an analogue to the seeking components observed by Chmarra et al. [20]. Therefore, the shortest path length, as currently used during the assessment of basic MIS skills, may be not a proper concept for analyzing optimal movements. To be more specific, path length alone may not be the optimal metric because the “work sphere” or work area may be a more important determinant of expertise. The differences observed in these experiments using SIMIS in a FLS training scenario provide more insight into the relationship between experience and instrument positioning.

It is possible that bias could have been introduced through the training and exposure of the subjects to MIS. Because the degree of variation in FLS scores that categorized the expertise of the participants was minimal, this was not a factor for any group except the group of first-year residents. This group had very wide variance in performance and accounted for all the sorting of subjects. Coincidentally, this group had recently been trained in MIS using a formal curriculum. Their performance suggests that this experience was not retained because many performed more poorly than the untrained engineering students. This finding suggests that bias was likely not an issue in the study cohort.

After face and content validities are ensured first, construct validation is the first important step in developing new metrics for MIS simulations and educational tools [2125]. Use of SIMIS force and position information may allow prediction of a subject’s expertise based on the profile of his or her forces and work space.

Future work will focus on more rigorous validation, with experiments including a wider array of subjects with a broader variation in skills. Should these experiments bear fruit, our next objective will be to attempt incorporation of SIMIS information into existing metrics such as the FLS program as a potential means of providing real-time feedback on performance. This study has generated regression curves for the force and position data that may be used in future experiments to predict a research subject’s level of training in MIS.

Conclusion

The novel Sensorized Instrument-Based Minimally Invasive Surgery system has demonstrated initial construct validity regarding force and position sensing. Future work will be directed toward distinguishing surgeons of greater variation in ability using SIMIS and toward assessing the ability of SIMIS to be incorporated into existing inanimate MIS simulators such as MISTELS.