Computer-generated simulators have been used for many years in the training of pilots and boat crews. However, the last 15 years have seen numerous attempts to introduce virtual-reality imaging into clinical medicine. The term “virtual reality” refers to “a computer generated representation of an environment that allows sensory interaction, thus giving the impression of actually being present” [1]. However, VR has probably been best defined by Riva [2] who suggested that it was a communication interface based on interactive three-dimensional (3D) visualization which allows the user to interface, interact with, and integrate different types of sensory inputs that simulate important aspects of real-world experience. Assessment of the first widely available virtual-reality simulator in surgery, i.e., the MIST VR device [3, 4], demonstrated very positive results. However, despite its introduction to surgery in 1991 by Satava [5], acceptance of the VR training approach has been slow. This may partly be due to the lack of well-controlled clinical trials.

A functional virtual-reality system should create virtual immersion by generating three-dimensional images that appear to be natural and move in real time without delay or blur. They ideally should also give an accurate depiction of detail and a high level of interactivity. Organs represented should be anatomically correct and have appropriate natural properties when grasped, clamped or cut. The task of developing technologies capable of generating anything more than rudimentary haptic sensations is one of the most difficult challenges that researchers have faced in this field.

The technology behind virtual-reality simulation has progressively increased in complexity since the original simulators utilized in aircraft pilot training. This has also been true for simulators that are directly involved with surgical procedures. In general the progression of simulator development has also tended to target minimally invasive surgery (MIS).

It was previously thought that training on a simulator simply improved a surgeon’s ability to operate on a simulator, but there is now a growing body of evidence that shows that VR simulator training can lead to improved performance in the real environment of the operating room. In 2001 a multidisciplinary team at Yale University conducted the first prospective, randomized, double-blinded trial which compared standard surgical residency training to training on MIST VR for part of a laparoscopic cholecystectomy [6]. The results of this study showed that training on the simulator significantly improved intraoperative performance, with VR-trained residents performing the procedure 30% faster and making six times fewer objectively assessed intraoperative errors. These results have been independently replicated in Denmark [7] and in Sweden [8]. VR training has been shown to improve intraoperative performance in other areas of surgery, such as otorhinolaryngology, where mission rehearsal on an endoscopic sinus simulator improved intraoperative performance during endoscopic trans-sphenoidal pituitary surgery [9]. The application of this technological approach to training of more complex minimally invasive procedures is currently an area of expanding research and has significant implications for trainees and trainers alike.

VR has shown similar potential as an assessment tool for surgical skills; this is almost intuitive as trainees’ performance on the machine is assessed as part of their training. In addition to evaluating MIS surgical skills [10, 11], VR has applications as an objective assessment tool in other areas of procedural medicine such as cardiology, where it has been successfully used to evaluate skills in carotid angiography [12].

One of the more technically challenging areas of laparoscopic surgery that surgeons are currently attempting to master lies within the realm of specialized colorectal surgery. This form of surgery has traditionally been performed in an open fashion, however with the introduction of laparoscopic colonic surgery the specialty has undergone a rapid transformation in operative approach and technique. Minimally invasive colonic resections (laparoscopic assisted colectomy, LAC) have clearly been demonstrated to reduce patient morbidity, hospital length of stay, and institutional cost, and to confer the benefit of earlier return of gastrointestinal function with equivalent oncological outcomes to that of open surgery in several randomized controlled multicentre trials [1315]. The advantages of this surgical approach however are tempered by increased operating time and the complexity of the technique that necessitates a long learning curve, which has been estimated as being between 15 and 50 cases for left-sided laparoscopic colectomy [1618].

Traditionally skill acquisition for LC has been acquired on a porcine model or on a cadaver. Simulation training on the other hand can be conducted almost anywhere. Recent advances in simulation technology have led to the development of a hybrid VR simulator which couples the accuracy of computer monitoring and virtual reality to a physical tray to create a simulation platform for training and assessing laparoscopic colectomy (Fig. 1). However, before any simulation can be used as a training and assessment tool its validity must first be demonstrated, particularly the performance metrics which are used to give feedback. One of the most important types of validation is demonstration of construct validity, best typified as the ability to distinguish between operators with different levels of experience or skill [19]. The aim of this study is to assess the construct validity of the ProMIS LC simulator.

Fig. 1
figure 1

ProMIS laparoscopic colectomy simulator

We hypothesize that the computer-generated metrics which measure surgical instrument usage efficiency and intraoperative errors as observed on the simulator anatomy trays will distinguish between the performance of very experienced LC surgeons and experienced MIS surgeons who are novice to LC.

Methods

Study apparatus

The ProMIS™ HALC simulator is comprised of multiple components including (a) bodyform, (b) tilt mechanism, (c) laptop computer, (d) ProMISCam (camera), (e) footpedal, (f) power leads (bodyform to laptop), (g) USB cable (h) firewire cable, (i) PCMCIA card, (j) light bulbs, and (k) synthetic colorectal anatomy trays including rectal ports and Velcro straps. The hardware and software have been evaluated in accordance with manufacturer specifications and protocols and in compliance with EN 9001 as noted on certificates of compliance issued with each ProMIS HALC simulator. All products are cleared for commercial distribution and will be used in accordance with approved product labelling. The ProMIS augmented-reality simulator (Haptica, Dublin, Ireland) used in this study is based on a Sony Vaio portable notebook computer with a 2.80-GHz Intel Pentium 4 processor running Windows XP Home Edition with 512 MB random-access memory (RAM) and a 30-GB hard drive. The laparoscopic interface consists of a torso shaped metallic mannequin, 74 cm L × 51 cm W × 23 cm D, with a black neoprene cover connected to the computer with a standard four-pin 1394 IEEE digital cable. The mannequin contains three separate camera tracking systems, arranged to identify any instrument inside the simulator from three different angles. The left and right cameras are positioned to capture instrument motion looking in a caudal direction of the left and right sides of the mannequin, respectively. The Laparo camera is positioned at the mannequin’s pubic symphysis looking cephalad and serves as the main viewing camera displayed on the computer screen for subjects when performing tasks on the simulator. The camera tracking systems capture instrument motion with Cartesian coordinates in the x, y, and z directions at an average rate of 30 frames per second (fps). The distal end of the laparoscopic instrument shaft is covered with two pieces of yellow electrical tape to serve as a labeling reference point for the camera tracking systems. Instrument movement is recorded and stored in distinct sections, based on the time the tips of the instrument are detected until they are removed from the mannequin. Precise measures of time, instrument path length, and smoothness of movement—detected by changes in instrument velocity—are recorded for each instrument (right and left hand) during each simulated task. The portable computer was placed at eye level and the simulator mannequin was positioned at a standard height for performing laparoscopic tasks. In training mode, the simulation guides the trainee through a series of tasks which progressively become more complex, enabling the development of the hand–eye motor coordination essential for safe clinical practice. Each task is based on an essential surgical step employed in laparoscopic colectomy (LC). Performance is objectively scored for time, and efficiency of movement for each task, for both left and right hands. The completion of the tasks requires the use of both hands and every time a trainee logs onto the system a record of their performance is stored on the database, thus providing an objective record of their progress. The playback facility, with comprehensive measurement for each task, can help the trainer identify specific areas for further practice.

Setting

Simulation laboratory in the National Surgical Training Centre at the Royal College of Surgeons, Dublin, Ireland.

Subjects

Fourteen subjects participated in the study. Eleven experienced surgeons (>14 years practising MIS) but novice in laparoscopic colorectal procedures constituted the novice group (CN) and three very experienced laparoscopic colorectal surgeons (minimum 300 laparoscopic colectomies completed) who had been practising MIS for 14 years served as our experts (CE). The CN group had enrolled as participants on a virtual-reality-based laparoscopic colorectal course which was held at the Adelaide and Meath Hospital and National Surgical Training Centre, RCSI in Dublin, Ireland.

Procedure

Training and testing was completed during 1 day. During the morning novice subjects received instruction in practice of LC from the experts with didactic educational sessions. This consisted of the stepwise application of surgical operative techniques to the performance of laparoscopic colorectal surgery [20]. During the afternoon subjects received instruction and demonstration on the surgical technique they were to use for the simulation component, which consisted of a physical model of the abdominal cavity with augmented virtual reality (VR) coaching cues. They were also closely supervised by an experienced operator. Trainees were trained to perform the LC in a standardized stepwise fashion, i.e., firstly mobilize the sigmoid colon, identify the ureter, locate the IMA and then transect, mobilize the descending colon, take down the splenic flexure, and perform the anastomosis and irrigation/suction on the ProMis VR simulator. All subjects then performed a laparoscopic colectomy. Both experts and novices performed the same simulated case, but the CN was also closely supervised by an experienced operator.

Performance assessment

The independent variable was whether the procedure was performed by an expert or a trainee and the dependent variables included time to perform the procedure, instrument path length, and the smoothness of the trajectory of the instruments. These metrics have been used previously and have been found to differentiate between surgeons of varying proficiencies. They are measured by the simulator. Time was measured (in seconds) from the moment the tips of the instruments were detected by the simulator until they were removed after successful completion of the suturing task. Path length was measured (in centimeters) by summing the total distance of both instrument tips. Smoothness of instrument movement was measured by detecting changes in instrument velocity over time (no units).

Also measured were certain, predefined intraoperative events and errors (Table 1), the occurrence or nonoccurrence of which were assessed by examining the anatomy trays after each procedure. The list of errors was drawn up and each error explicitly defined prior to the study. Assessment of the trays was carried out by two raters who were blinded to the status of the subject. Test–retest reliability was found to be 92%, and inter-rater reliability was set at a minimum of 80%.

Table 1 Intraoperative errors and their operational definitions, scored from performance on the anatomy tray

Results

All the novice and experienced surgeons took part in the study and completed the simulated laparoscopic colectomy procedure. Demographic details are shown in Table 2.

Table 2 Demographic details of the two groups

Data from the experts and the novices were summed and the means and standard deviations are shown in Table 3. The novices took almost twice as long to complete the procedure as the experts (i.e., 1.81 times) and demonstrated a similar pattern for instrument path length and smoothness of instrument usage.

Table 3 The means and standard deviations of the computer-scored performance characteristics of the two groups who performed the simulated laparoscopic colectomy

These differences were compared for significance with Mann–Whitney U tests and, despite the small number of subjects in each group, i.e. 11 novices compared with 3 experts, all three differences were observed to be statistically significant. Power calculations confirmed adequate group size. With regard to time for individual procedural steps, performing the anastomosis took both groups the longest (experts 13.1 ± 2 min, novices 21.15 ± 6.1 min, mean ± SD) and mobilizing the sigmoid took the shortest amount of time for both groups (experts 0.6 ± 0.35 min, novices 2.75 ± 5.7 min). We also assessed the strength of the association between scores on these variables by using the Pearson product moment correlation coefficient. We found that all three measures were all strongly correlated (time and smoothness r = 0.91, p < 0.0001; time and total path length r = 0.79, p < 0.001; smoothness and path length r = 0.88, p < 0.0001), indicating that all three measures were measuring related performance characteristics.

On examination of the anatomy trays, more intraoperative errors were committed on average by the novices then the experts (4.7 versus 2.67, p = 0.03; Table 4, 5, Fig. 2) The average number of organ injuries (error 13) was also higher in the novice group (0.91 versus 0.66, Fig. 2) On closer analysis of the 13 errors, the experts performed worse in 4 of the 13 errors. These errors were inadequate division of the inferior mesenteric vein, inadequate mobilization of the left colon, inadequate rectal transection, and organ injury; two of the three experts committed an organ injury (66%) although more injuries were made by the novices (ten injuries by seven novices). As the numbers involved are very low, analysis of the individual errors is probably not relevant. None of the subjects divided the inferior mesenteric artery inadequately, and all other errors were enacted more frequently by the novices

Table 4 Mean error and organ injury scores for the two groups
Table 5 Group comparisons for each anatomy tray error
Fig. 2
figure 2

Average number of intraoperative errors and organ injuries

Organ injuries that were committed included bowel perforations and injuries to the iliac vessels (Fig. 3). None of the subjects damaged the ureter.

Fig. 3
figure 3

Images of anatomy tray errors

Discussion

The current enthusiasm for validation of training and assessment devices and strategies is a relatively new enterprise for the surgical community. However, scientific validation is a common activity in psychology and can be traced back to the work done more than a century ago. The growth and variety of tests (good and bad) developed rapidly during the 20th century, largely in the USA. In the 1974 the American Psychological Association (APA), the American Educational Research Association, and the National Council on Measurement in Education developed standards for judging and assessing tests. These are referred to as the standards [21]. This publication does not lay down rules about test quality, but rather gives guidelines on a number of issues relating to administration, interpretation, ethical issues, appropriate norms, etc. However, it also provides rigorous, well-proven guidance on validation, reliability, and error measurement of tests. Some key concepts relating to APA standards include demonstration of face validity, content validity, discriminative validity, and predictive validity. The main aim of this study was to determine whether the metrics associated with this laparoscopic colorectal procedural simulator distinguished between the performances of experienced minimally invasive colorectal surgeons and very experienced minimally invasive surgeons but who were novice in MIS colorectal procedures. A secondary aim was to move along the acceptance by surgeons of virtual-reality training for a major abdominal surgical procedure, i.e., left-sided hemicolectomy performed minimally invasively.

We found that both the machine-generated performance metrics and the anatomy tray scores distinguished between the two groups. The expert laparoscopic colorectal surgeons performed the procedure faster, using their instruments more efficiently, and made fewer intraoperative errors on the anatomy tray. This supports the findings of previous reports that also showed that experienced surgeons performed the procedure more efficiently [10,22]. Indeed efficient performance is emerging as one of the most important indicators of skill and/or experience. All of the delegates on this course were consultant/attending surgeons and thus very experienced laparoscopic surgeons. It is very likely that, had we used a less experienced subject pool, the magnitude of the differences between the groups would have been even greater.

The experts did not perform perfect procedures and made in total eight errors between them. The small number of subjects in the expert group means that analysis of individual error categories is not useful. However we believe these errors partly reflect the learning curve on the simulator itself [10, 23] and, as the subjects only performed one simulated procedure, there was no chance to practise or train out this curve. In another study we are conducting which is currently underway, expert surgeons enacted errors during their first trial on a simulator, but fewer on subsequent trials. Some of the errors committed are relatively artificial and would not constitute errors in real life, such as division of the inferior mesenteric vein, which is not always carried out laparoscopically, and complete splenic flexure mobilization, which is not always necessary. It is likely that fewer errors would have been made if subsequent trials on the simulator were recorded. However, despite this, experts still enacted on average significantly fewer tray errors overall than the novices, demonstrating construct validity.

At present the two most common training platforms for this procedure are the porcine model and the cadaver which both require very specialized training environments, are very expensive, and each trainee probably only gets to perform part of the procedure once. The type of simulation used in this study is a hybrid of a physical tray and augmented virtual reality cues to enhance the learning experience. The advantage of the type of simulator that we used is that each subject has the opportunity to complete the entire procedure with the exact same instruments that they would use during the real procedure and they can have instant objective feedback on their performance. The quality of this feedback is also orders of magnitude superior to what can be achieved on a porcine model or a cadaver. For example, in these latter two models it may be very difficult to convince a trainee that their anastomosis was under tension or that it was not centered, either of which can have major implications for the success of the procedure. With the type of simulation that we used here this is quite a simple matter. One simply removes the anatomy tray from the simulator and shows the trainee their anastomosis and explains the error and then shows them another tray with the surgical technique completed appropriately. This type of example demonstrates one of the very powerful aspects of training on a simulator termed vicarious learning [24]. Bandura has shown that considerable learning takes place vicariously, through observation learning or modeling. We watch the behavior of others and take note of the consequences of that behavior and use this information to inform future performance or behavior.

In the course that we report on here we had an experienced proctor on each simulator. We did this so in order to optimize the learning experience of the trainee. It is probable that a proctor may be required for the first few training trials but may not be required thereafter unless the trainee encounters a learning obstacle. This means that training does not become too arduous a duty for more experienced colleagues. The advantage of simulation is that the trainee’s performance can still be tracked and at the convenience of the mentor. The trainee can complete their training trial(s) and their mentor can check their computer scores online and their anatomy tray error performance at their convenience. Despite this proctoring, some of the subjects left the specimen in place. If this had happened in the operating room this would have constituted a major violation of operative protocol. Reasons for this serious omission could have been that the trainee subjects were not taking the training seriously or were simply behaving in a slipshod manner. It is imperative in simulator-based training that the person being trained on the simulator should be encouraged to behave in the same way towards the simulator that they would in the operating room. This may be even more important in training or retraining experienced clinically active surgeons in new procedures. We believe that this approach will lead to optimal skills transfer from the simulation to the operating room.

The design and implementation of an integrated approach to the training of current and prospective surgical residents in minimally invasive surgical techniques requires careful consideration. We believe that the optimal approach to simulation training is for trainees to first complete an online didactic component explaining the relevant anatomy, physiology and pathology, operation setup, instruments required, and steps of the of the procedure. They should demonstrate proficiency on this learning component before they attend the simulation skills course. This approach has two functions [25]; firstly it demonstrates to the mentor that the trainee is motivated to learn the procedure and secondly it means that all of the trainees who turn up for the course have at least a good working knowledge about the procedure. This facilitates an increased emphasis on training skills associated with the procedure and imparting technique wisdom rather than knowledge remediation.

This is the first report of a hybrid simulator being used for training a high-risk colorectal procedure that requires advanced laparoscopic skills. The metrics clearly distinguished between the performance of experts and very experienced novices. Three of the major advantages of this type of training platform are: firstly, that it can be set up and used almost anywhere; secondly, that it provides the trainer with objective information on the trainee’s performance and progress; and thirdly, that it provides the trainee with physical evidence of the errors they have enacted on the anatomy tray, which is a very powerful learning strategy.

Demonstration of construct validity is an important step in the validation of a simulator. The metrics should be able to distinguish between the performance of experts and novices. If they are not it points to either a problem with the simulation or the metrics. In this study we have been able to demonstrate construct validity and therefore we can now proceed to establish a technical skills proficiency level. The establishment of a level of proficiency means that we can set an objective benchmark for trainees to reach before they operate on a patient. This has been shown to be a highly effective approach to training [25]. Two of the studies that have used this approach have shown that it elicits intraoperative performance that is between three and six times better than that from residents trained in the traditional way [6, 8]. Furthermore, proficiency was objectively established on criteria that would be meaningful to trainees, i.e., their mentors. This approach to training is attractive for a number of reasons. Firstly, it is objective, fair, and transparent and is related to the real-world performance of experienced surgeons who operate and take care of patients daily and safely. Secondly, this approach is equally adept at handling the truly gifted surgeon who can demonstrate proficiency quickly and the surgeon who acquires skills at a slower pace. Both will demonstrate proficiency before progressing and there is no evidence to date that indicates that the surgeon who takes longer to demonstrate proficiency is less safe than their more gifted colleague.

Conclusion

The results from this study carried out during the first virtual-reality-based training course clearly demonstrate construct validity for the ProMIS simulator. This is an important first step in the validation of the simulator as a training and assessment tool for surgeons. When applied as part of a structured training curriculum, VR simulation has been shown to improve intraoperative performance and the results from this study should advance the role of simulation even further. The pivotal and central role played by VR simulation in the arena of surgery looks set to continue.