Introduction

Minimally invasive surgery is becoming a sought after method for many surgical treatments [1] as it reduces operative time, shortens the hospital stay, and minimizes recovery time [2]. This requires an indirect view of the operative field via an intraoperative imaging technology such as fluoroscopic x-rays or laparoscopic video. This constrained view makes it extremely challenging in guiding trainees to see the target structure, make meaning of it, and identify where to look next [3]. In order to convey this professional vision, trainers often do work with the video to make the target salient to the trainees. This entails a series of verbal explanations, gestures or pointing on the monitor, or moving of the laparoscopic camera itself to reveal the subtle changes in the anatomical view [3, 4].

These efforts, however, are not efficient. Trainees need to parse, envision and make sense of trainers’ gestures to perceive the location and direction given by the instruction [3, 5], as manifested by an increase in attempts and elongated path toward the target [6], as well as multiple pauses during the operation waiting for confirmation from the trainers [5]. Thus, the surgery is often segmented into small steps which draw trainees’ attention away from gaining professional vision, i.e., to learn to see and work with the video as an expert [3], to simply attend to the practice of accomplishing the technical task at hand.

The importance of professional vision for trainees to become competent in surgery has been emphasized in studies on surgical education [3, 7]. These studies demonstrated that situated learning in laparoscopic surgery occurred when the surgical residents were trained to perceive and appropriate the view of the body [3], as well as to use the view of the body for their actions [7]. In addition, previous research has demonstrated the association between professional vision and competency in surgery [8,9,10]. These studies revealed a substantial gap of professional vision between experts and novices, showing that experts concentrated their gaze on the target, while novices were more varied in their gaze behaviors [8], and that these differences in gaze behaviors impacted novices’ surgical performance [9]. To improve the adoption of professional vision, trainees were presented with expert gaze behaviors on offline laparoscopic videos [10]. Although the method was effective in improving the trainees’ surgical performance, it depends on trainees’ identification and sense-making of the expert gaze patterns and is constrained by the availability of the gaze-annotated videos and trainees’ after work hours.

As the operating room (OR) is the key place for trainees to become competent in surgery, we propose to enhance the conveyance of professional vision for intraoperative training. We have designed a virtual pointing and telestration system that allows a trainer to point or draw a freehand sketch over live laparoscopic video. We hypothesize that the virtual pointer can improve the trainee’s professional vision through an increased awareness and understanding of the information the trainer is conveying. This in turn will result in a performance improvement by the trainees. Due to the incremental and transferable nature of learning, we assessed trainees’ performance over the course of four tasks and evaluated the efficacy of the virtual pointer in improving professional vision adoption over time.

Material and methods

System design

The virtual pointer was designed to facilitate the conveyance of expert knowledge by enabling trainers to point or draw on the laparoscopic video for the trainees to see (Fig. 1). To this end, the Microsoft Kinect sensor version 2 (Microsoft Corporation, USA) was used as a mechanism of touchless interaction—enabling the system to be used in the sterile operating field [11]. Developed in C# for the Windows Operating System, the application is a transparent window that can be overlaid on any screen or other application. Implementing Microsoft.Speech API and body tracking, it uses a combination of audio key words and hand movements to call upon the different functionalities [12], such as a pointer for referencing or a freehand drawing tool (Fig. 1).

Fig. 1
figure 1

The virtual pointer: a virtual pointer user interface with list of verbal command (top left), current mode (top center), gesture recognition feedback window (bottom left), telestration (green lines on anatomy), and pointer (a green dot; not shown); b trainer using the virtual pointer—closed hand is used for drawing with the telestration tool

The flowchart of the application is shown in Fig. 2. To awaken the Kinect, the first verbal command is “Kinect ready.” The Kinect then starts detecting the movement of the trainer and listening for verbal keywords. There are two modes that can be called upon via verbal commands: a pointer with “Kinect point” and a drawing tool with “Kinect draw”. In pointing mode, the user moves the hand to control a small green circle, which acts as a pointer. In drawing mode, the user closes their hand to draw and then stops drawing by opening the hand. To clear the screen of all annotations, there is the verbal command “Kinect clear.” When use of the application is finished, the voice command “Kinect close” can be used to set the program to sleep and stop the Kinect from detecting.

Fig. 2
figure 2

Flowchart of virtual pointer

The system setup is illustrated in Fig. 3. For use in a laparoscopic training setup, the laparoscopic video is ported to a Windows laptop running the virtual pointer application via DVI2USB \(3.0^{\mathrm{TM}}\) (Epiphan System Inc, Canada), and is played full-screen in the VLC media player (VideoLan, France), which is set to a 0-ms buffer cache to minimize the delay of video streaming. The virtual pointer application is then executed which opens a transparent window as an overlay on the video player. The laptop’s screen is then mirrored back to the laparoscopic video system’s monitor.

Fig. 3
figure 3

System diagram for virtual pointer setup in laparoscopic training

Experiment design and procedure

The experimental design is a counterbalanced, within-subject design. We performed a controlled experiment with two mentoring approaches—the control is standard condition, and the intervention is virtual pointer condition. In the standard condition, trainer instruction was conducted as it would be normally, through verbal or hand gestures. In the virtual pointer condition, the virtual pointer application was used by the trainers as an addition to standard guidance to facilitate instruction. The trainees worked on four simulated laparoscopic tasks under trainer guidance. The tasks were selected based on a hierarchical task analysis of the laparoscopic cholecystectomy procedure and confirmed by an attending surgeon that they were of similar difficulty levels and required both skills of anatomical structure identification and instrument manipulation. The tasks were performed on a validated laparoscopic training physical model [13], including (1) mobilizing the cystic duct and the cystic artery, (2) clipping the cystic duct, (3) clipping the cystic artery, and (4) cutting the cystic artery and the cystic duct (Fig. 4).

Fig. 4
figure 4

Task models used in the study: a Used for task 1-mobilizing the cystic duct and artery; b without the staples across the structures) Used for task 2-clipping the cystic duct; and task 3-clipping the cystic artery; (b—as shown) Used for task 4-cutting the cystic artery and duct

Task order and condition were counterbalanced for each trainee yielding a total of 14 runs in the virtual pointer condition and 14 runs performed in the standard condition. Before the study, the trainees and trainers completed a demographics questionnaire, which included information on their surgical experience and familiarity with the Kinect system (refer to Supplementary File 1. Pre-Questionnaire). After each task, the trainees and trainers completed a performance assessment questionnaire (refer to Supplementary File 2. Performance Assessment), and the trainees completed a quality of instruction questionnaire (refer to Supplementary File 3. Quality of Instruction Questionnaire). The study was video-recorded and the operative field was screen-recorded for the objective assessment of performance. The study was approved by the University of Maryland, Baltimore County institutional review board. Informed consent was obtained from all participants before their participation.

Participants

The participants were recruited from the Department of General Surgery in the Anne Arundel Medical Center, Annapolis, MD. A total of 7 surgical trainees, including 1 surgical fellow, 1 research fellow, and 5 surgical residents (3 PGY-1 and 2 PGY-2) were recruited. One attending surgeon and one surgical fellow were recruited as the trainers. The attending surgeon guided the surgical fellow in performing the tasks and the surgical fellow guided the rest of the trainees.

Study setting and system setup

The Park Trainer (Stryker Corporation, USA) was used for the simulated laparoscopic tasks (Fig. 5, center). The Park Trainer consists of a housing unit for physical anatomical models, a flexible shield with openings for the laparoscopic camera and instruments to be inserted, a standard laparoscopic camera with light source using the Stryker computer system, and a standard laparoscopic monitor on an adjustable arm at the top. The Microsoft Kinect sensor was set to the left of the Park Trainer (Fig. 5, left). This was determined the best location for this study as the tasks used called for the trainer to stand to the left of the trainee using their right hand to manipulate the laparoscopic camera. The Windows laptop computer running the virtual pointer application was then placed on a table to the right of the Park Trainer (Fig. 5, right). An external camera was set up to the left of the trainer to capture the trainer’s hand movement and the view of the virtual pointer on the trainee’s screen.

Fig. 5
figure 5

Study setting with the virtual pointer system and Park Trainer

Measures

Quality of instruction

Quality of instruction was evaluated in terms of clarity, reacting, and structuring. It was adapted from an evaluation study of instructional technologies for laparoscopic surgery [14]. We especially focused on the trainee’s perceptions of instruction after each task on the basis of aggregated scores. This is because a trainer may provide many nominal stimuli, which would eventually become functional stimuli when a trainee perceives them, and the trainer will use them as cues to direct the goals and behaviors [15].

Subjective assessments of performance

After performing each task, the trainee and trainer were asked to complete a performance assessment global rating questionnaire [16]. This questionnaire was adapted from the global rating scale (GRS) instrument [16]. The global rating scale is a 5-point Likert scale, with the following criteria: (1) depth perception, (2) bimanual dexterity, (3) time and motion, (4) flow of operation (5) instrument handling, (6) knowledge of specific procedure. Time and motion were assessed by the number of unnecessary movements and can be mapped to the objective assessment of economy of movement. With the GRS, the trainees grade themselves on their self-perception of performance and trainers grade the trainees on observed performance.

Objective assessments of performance

Economy of Movement: Economy of movement was assessed on the number of movements of the instrument manipulated by the dominant hand, the hand performing the primary work such as dissecting and cutting, and non-dominant hand, the hand performing supporting work such as grasping and retracting. This assessment was validated in a number of evaluation studies for the efficacy of simulation in improving laparoscopic performance. The notion of a movement, adapted from [17], is constituted of perceivable pauses in a continuous movement of the instrument or changes in instrument direction. The less the number of movements, the better the economy of movement. Three researchers (the first three authors) examined one recorded trial (4 tasks for one subject) separately, and counted the number of movements for each hand. The interrater reliability is measured by intraclass correlation coefficient, ICC \((1, 1) = 0.989\) with 95% confidence interval of (0.940, 0.999), which is interpreted as high agreement [18]. Thus, the rest of counts were split randomly within the three raters.

Fig. 6
figure 6

Comparison of number of errors between the virtual pointer and standard conditions for each error type

Time to Task Completion: Length of task (with and without instruction) for each condition was recorded in seconds from the external video recording. Length of task with instruction is the total time to complete the task with all trainer instruction including introductory trainer instruction and trainer instruction with no action on the task being taken by the trainee. Length of task without instruction only included the time the trainee actively worked on the task. For each measure, components outside of the task were edited out, such as waiting for an instrument or the model falling over.

Number of Errors: Three researchers (the first three authors) identified and described the frequent errors seen in the recorded video for one trainee separately and then compared their coded errors. Errors were at first counted based on the modified list of External Error modes (EEMS-common errors in laparoscopic surgeries) [19] as well as noting any other observed errors. This yielded a very low Fleiss’ Kappa score of 0.01. Contributions to this low Fleiss’ Kappa score were a misunderstanding between the differences of economy of movement and errors as well as an unclear distinction between trainer commentary with regard to instruction and trainer commentary with regard to correcting an error. After discussion, a list of 10 common errors was created to assist the understanding of what should be considered an error. The three researchers coded the first participant’s errors again and achieved a Fleiss’s Kappa of 0.568. After determining that some categories could be condensed into one broader category, a new, condensed list of 4 error types was created, consisting of: (1) wrong placement, orientation, location of instrument; (2) wrong amount of force; (3) actions not belonging to procedure; and (4) damage of tissue/model. The researchers did another round of coding and comparison for the first two trainees, and achieved substantial interrater reliability (Fleiss’ \(\hbox {Kappa} = 0.76\)) [20] after which the rest of the trainees were coded.

Statistical analysis

Statistical analysis was performed using a linear mixed model to compare the trainees’ performance between virtual pointer and standard conditions. Since the analyses focused on the effect of the virtual pointer and how the effect of the virtual pointer changes as knowledge accumulates, we modeled the mentoring conditions (virtual pointer or Standard) and the task order as the fixed factors. Because the difficulty levels of the tasks were confirmed by the attending surgeon to be similar, we considered the task as a random factor. The trainees were modeled as another random factor. All linear mixed model statistical analyses were carried out using R version 3.2.2 (R Foundation for Statistical Computing, Vienna, Austria). The results are shown as mean with standard error of the mean. For all tests, a p value of less than 0.05 was considered statistically significant.

Results

Quality of instruction

The quality of instruction was perceived to be significantly improved in the virtual pointer condition, compared to the Standard condition (\(p = 0.024\)). Task order had limited impact on the trainees’ perceptions of instruction (\(p = 0.225\)).

Number of errors

The virtual pointer had no significant effect on the number of errors committed by the trainees (\(p = 0.367\)), nor did the task order (\(p = 0.761\)). In addition, as is evidenced in Fig. 6, the frequency of types of errors is very similar between the two conditions, with the greatest contributions of errors stemming from amount of force and instrument placement.

Time to task completion

The virtual pointer had no significant effect overall in time to task completion by the trainees with instructions (\(p = 0.183\)) or without instructions (\(p = 0.730\)) although the trend is that the use of the virtual pointer adds time to the task performance - at least in the beginning. This could be an indication of increased cognitive load by the trainee. Understandably, the task order alone is a significant factor for time to task completion with instructions (\(p = 0.003\)) and without instructions (\(p = 0.017\)) (Fig. 7).

Fig. 7
figure 7

Comparison of time to task completion with and without instruction between the virtual pointer and standard conditions in each task order: a time to task completion with instructions; b time to task completion without instructions

Economy of movement

A significant decrease in the number of movements was found in trainees’ non-dominant hands in the virtual pointer condition, compared to the standard condition (\(p = 0.012\)), although no significant difference was found in the dominant hand (Fig. 8a). Thus, the task order effect was analyzed for the number of movements in the non-dominant hand.

Fig. 8
figure 8

The comparison of the number of movements between the virtual pointer and standard conditions: a Overall comparison for dominant and non-dominant hands; b Comparison based on the task order for non-dominant hands

When considering the task order, the number of movements was similar between the two mentoring conditions in each run. However, when the power was increased by combining the runs together into the two sequential groups, the number of movements had significantly decreased in the virtual pointer condition compared to the Standard condition (\(p = 0.021\)) in the second sequential group (task order 3 & 4), while it remained similar in the first sequential group (task order 1 & 2) (Fig. 8b). This indicates that as the knowledge of the anatomical structure and the procedure accumulated, the effect of virtual pointer became evident in enabling the trainees to more efficiently move the instrument to the target.

Subjective assessments of performance

Overall, the trainees found no significant improvements in their performance when guided by the trainers using the virtual pointer (\(p = 0.685\)). However, the trainer’s assessment of the trainees’ performance showed that the trainers saw a significant improvement in the trainees’ performance \((p < 0.001\)) (Fig. 9a).

Fig. 9
figure 9

The trainers’ and trainees’ subjective performance assessment: a overall rating; bg rating for each criterion; h comparison of the trainers’ perception of Time and Motion criterion for each task order

The comparison of scores between each criterion showed that the virtual pointer had a significant effect on the trainers’ perception of all of the assessment criteria, while the trainees’ perceptions remained the same (Fig. 9b–g). In addition, when we focus on the Time and Motion criterion, which is associated with our objective assessment of economy of motion, we find that the trainers’ perception on time and motion was significantly improved in the virtual pointer condition in the fourth run (\(p =0.017\)), while no significant differences were found in the initial three runs (Fig. 9h). This trend, corresponding to the improvement of economy of movement in the objective assessment, indicates that the benefits of virtual pointer in reducing unnecessary movements become evident after initial knowledge is gained.

Discussion

Difficulty in executing laparoscopic tasks has resulted in many instructing techniques to convey the knowledge of the trainer to a trainee [3, 21, 22]. However, many of these methods are focused on observation and then replication of what actions occurred. We developed an alternative to conventional teaching methods with the virtual pointer system. This system aims to improve the trainee’s “professional vision”—visualizing the anatomy and task as would an expert. The impact of the virtual pointer in laparoscopic training was analyzed by objective and subjective assessment of trainees’ surgical performance. Results showed that the use of the virtual pointer effectively improved quality of instruction and led to better performance in terms of instrument manipulation through economy of movement after the initial training was received, compared to the Standard condition with only verbal and gestural instructions.

The concept of using gaze control to improve motor skills has garnered attention for laparoscopic training. Recent efforts have been spent on visualizing expert gaze to the trainees [10, 17]. However, professional vision is more than just knowing where to look. It serves as a foundation for gaze behaviors—highlighting a spot only becomes a cue when a trainee is able to make sense of it. Thus, the virtual pointer was designed to enable a trainer to draw or point, through which the trainer establishes what matters and articulates why. With this knowledge, trainees would align their gaze behaviors to the experts’, manifested by improved economy of movement.

In the initial two training runs, no significant differences in economy of movement were shown between the virtual pointer and standard conditions. Although the trainees had some basic knowledge of the laparoscopic cholecystectomy procedure, they still needed first to decipher the basic anatomical structures in the physical model, such as the location of the cystic duct and cystic artery, before moving on to dissect or cut tissues [5]. Thus, the use of the virtual pointer at this stage mainly served for the initial knowledge building. Our results indicate that such usage of the virtual pointer is similar as the Standard condition.

In the later training runs, with the mutual understanding of the basic anatomical structures in the operative field, more effort was used to show the specific target on the anatomical structures [3]. The use of the virtual pointer allowed the trainers to highlight the target and guide the trainees to see and differentiate the target from the distracting background. With the adoption of professional vision, the trainees illustrated more direct instrument movements. Thus, the significant improvement in the economy of movement in the virtual pointer condition suggests that the Virtual Pointer is more effective in facilitating the conveyance and the adoption of professional vision as a certain level of mastery has been attained.

It is interesting for us to see that the trainers’ perception of the trainee’s performance overall was improved with the use of Virtual Pointer, as opposed to the insignificant effect of the Virtual Pointer on the objective assessment of economy of movement in the initial three training runs. This indicates that the learning benefits of the virtual pointer were perceived by the trainers, who used the system in providing the guidance, while such benefits became objectively evident later on. In addition, the trainers’ improved perception of the overall performance when using the virtual pointer highlighted that learning how to see the operative field as an expert could potentially enhance both the technical skills and the procedural knowledge of the trainees, and thus improve the quality of the surgical task.

Number of errors was assessed to determine if the virtual pointer could reduce common errors associated with laparoscopic tasks. No statistically significant difference was found between the two conditions. This could be due to a virtual pointer contributing more to highlighting and fixing errors than preventing errors. For instance, if an instrument is moving to the wrong location, the virtual pointer would be used to correct the mistake rather than be able to prevent the error. It is possible that over time the reduction in errors would be greater with the virtual pointer after further mastery of skills. Further testing would be required to test this hypothesis.

It is noteworthy that most trainees recruited in this study were novices who had not performed any laparoscopic surgery before. More research may be required to evaluate the impact of the Virtual Pointer on the surgical performance among more senior residents or between two collaborating surgeons of similar skill level. Additionally, the limited sample size hindered us from assessing more learning effects of the system. With a larger sample size, we hypothesize that the virtual pointer could potentially reduce the time to task completion, given that we have identified the significant improvement in the economy of movements with our current sample and see an overall trend of reduced time to task completion in the fourth run of the virtual pointer condition. Although we have found the virtual pointer facilitated the adoption of professional vision, with more runs, we could explain this trend more explicitly.

In addition, we noticed that there was a less than 1 second time lag between the laparoscopic video and the annotated video. Although this time difference had limited influence on the operative time— no significant difference of task duration was found between the virtual pointer condition and the Standard condition, for the system to be implemented in the operating rooms, advanced video streaming technologies should be evaluated. We did encounter two instances of system malfunction among a total of 28 runs. Although it took less than half a minute to reboot the system, we do recognize the importance of technical support for the system to be used in the OR. Given these technical considerations, a secondary display for the annotated video is suggested to be used.

Conclusions

To facilitate the conveyance of professional vision, a Microsoft Kinect-based video referencing system, Virtual Pointer, was designed for the intraoperative laparoscopic training. The proposed system enables a trainer to point or draw freehand sketches over a video for the trainees to see. In this study, we evaluated the efficacy of the virtual pointer in improving the professional vision adoption by assessing trainees’ performance over the course of four tasks from the same laparoscopic procedure. Experimental results show that the system substantially improved trainees’ performance, in terms of economy of movement, after the initial mastery has been attained. This indicates that the improved use of laparoscopic video results in more direct instrument movement. The improved trainers’ perception of trainees’ performance confirms the benefits of the virtual pointer in the adoption of professional vision, which would further enhance the trainees’ technical skills and understanding of the procedure. More participants and runs may make this trend more evident.