Subjective methods of trainee assessment are no longer adequate for surgical training [1]. Reduced working hours [2, 3], increased demands from the political sector [4] and financial pressures [5] mean that more objective measures are required. Surgical simulation is an effective tool for training and assessment. Simulators can reduce learning curves outside the operating theatre in a pressure-free environment, without requiring formal supervision [6]. Studies show that skills acquired during simulation training are transferable to the operating room [7]. Simulation in laparoscopic training refers to a wide range of devices from simple box trainers [8], cadaveric models [9], live animals [10], to complex virtual reality (VR) systems (e.g. MIST-VR®, LapSim® ProMIS™, and LapMentor™) [1114]. This has led to the development of simulator assessment tools which include motion analysis.

Motion analysis allows assessment of surgical dexterity using parameters that are extracted from movement of the hands or laparoscopic instruments [15]. Several different motion analysis systems have been developed (Table 1). This can be inbuilt within a simulator (e.g. ProMIS™) or as a separate device, enabling flexible use (e.g. Imperial College Surgical Assessment Device, ICSAD) [16]. Objective assessment of laparoscopic skill could be carried out using motion analysis if endpoints for each parameter are quantified according to pre-defined levels of experience. The conversion of motion analysis data into competency-based scores or indexes could provide a valuable source of trainee feedback [17]. This is an automatic and instant process [18]. Feedback could be useful on two levels, firstly by providing a quantitative index to define varying levels of experience, which trainees can work towards. Secondly, it could serve as evidence of professional development that is assessed at annual progress reviews. Before motion analysis can be used to assess laparoscopic competence, the technology and metrics measured must first be validated [19].

Table 1 Summary of motion analysis systems available for assessment of laparoscopic skill in general surgery

Validation of any new method for training or assessment is a critical step [20]. This is the extent to which an instrument measures what it was designed to measure [21, 22]. The process should begin by defining a “construct”, which defines the underlying trait for which a new training tool is designed [20]. The more forms of validity (Table 2) that are demonstrated, the stronger the overall argument [20].

Table 2 Overview of validity types (adapted from Moorthy et al. [44])

The aims of this systematic review are to provide an overview of the different motion analysis technologies available for the assessment of laparoscopic skill, and to identify the evidence for their validity.

Methods

Data resources and search criteria

A systematic review was performed according to the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines [23]. The literature search was conducted using the following databases: Embase Classic + Embase (1947 to 2011 week 38), MEDLINE (1947 to present) and PubMed. For each database we searched three domains of exploded MeSH keyword terms. The general terms for each domain were (1) motion analysis, (2) validation and (3) laparoscopy. Where a keyword mapped to further subject headings, those considered relevant were also exploded to maximise coverage of the literature. Studies published in a foreign language were translated into English [24]. The last search date was 29 September 2011. This search strategy was undertaken by two independent reviewers, and articles retrieved according to the inclusion criteria. Articles arising from cross-referencing were also included. Duplicate articles and those clearly unrelated to the inclusion criteria were excluded. Any disagreements between the reviewers were referred to a third party.

Inclusion and exclusion criteria

All studies investigating motion analysis as a valid tool for assessment of laparoscopic skill in general surgery were included. Inclusion criteria included: sufficient detail of motion analysis technology used (including information regarding the precise motion metrics measured), description of the tasks being investigated and the type of validity measured. Studies that validated laparoscopic simulators for which motion analysis did not form the primary method of assessment were excluded. Furthermore, studies were excluded if they were validating assessment tools in specialities other than general surgery and/or if motion analysis was being validated for laparoscopic training rather than assessment. Evidence validating motion analysis for laparoscopic training is limited, and its inclusion would lead to further study design heterogeneity. Review articles and conference abstracts were also excluded.

Outcome measures and analysis

Each of the studies included was rated according to a modified form of the Oxford Centre for Evidence-Based Medicine (CEBM) levels of evidence and recommendation [25, 26]. Information was extracted from each study in accordance to the inclusion criteria. Common endpoints between studies were identified and compared when statistically significant results were reported, the principle summary statistic being the difference in means or medians. It was judged that the data were not suitable for meta-analysis due to study design heterogeneity.

Results

The primary search identified 2,039 records. Three hundred and eighty-eight duplicates were removed, and the remaining 1,651 abstract records screened for relevance. Following this process, 1,522 records were excluded and 129 full-text articles obtained. Full-text review excluded a further 124 studies, while cross-referencing identified 8 studies. At the end of this process, 13 studies were included for review (Fig. 1). These studies investigated four different motion analysis devices: the Advanced Dundee Endoscopic Psychomotor Tester (ADEPT; two studies [27, 28]), the Hiroshima University Endoscopic Surgical Assessment Device (HUESAD; two studies [29, 30]), the Imperial College Surgical Assessment Device (ICSAD; three studies [9, 31, 32]), the ProMIS Augmented Reality Simulator (five studies [13, 3336]) and the Robotic and Video Motion Analysis Software (ROVIMAS; one study [18]) (Table 3). No randomized controlled trials (RCTs) were identified. Twelve studies were graded at level 2b evidence [9, 13, 18, 2836], and one study at level 3 [27].

Fig. 1
figure 1

PRISMA [23] flow diagram for selection of studies

Table 3 Studies included in review

Construct validity

Construct validity was examined in 12 (92.3 %) studies [9, 13, 18, 2836]. There was a large degree of variation between studies, in terms of both group allocation and methodology (Table 4). Comparison between common endpoints (Table 5) was made in order to provide the following levels of recommendation (Table 6):

Table 4 Summary of methods
Table 5 Summary of statistical outcomes for construct validity studies
Table 6 Level of evidence and recommendation for each motion analysis device

ADEPT: One study confirmed construct validity for the error score endpoint [28], when comparing novices and experts (level 3 recommendation).

HUESAD: Two studies established construct validity for the following endpoints: time taken to complete task [29, 30] (level 2 recommendation), deviation from ideal vertical and horizontal planes [29] and approaching time [30] (level 3 recommendation) when comparing novices and experts during a navigation task.

ICSAD: Three studies reported construct validity for the following endpoints: time (stage 1, 2 and 4 [9], tasks 1, 2, 3 and 4 [32]), number of hand movements (stage 1, 2 and 4 [9], task 1, 3 and 4 [32]) and path length (stage 1 and 2 [9], task 1 and 4 [32]) when comparing novices, intermediates and experts in the following tasks: laparoscopic cholecystectomy (LC) [9] and fundamentals of laparoscopic surgery (FLS) tasks [32] (all level 2 recommendations). Moorthy et al. [31] reported construct validity for time and path length in their laparoscopic suturing task when novices were compared with intermediates, and intermediates with experts. Two of the studies also demonstrated construct validity of overall expert rating scales that were used alongside motion analysis (level 2 recommendation) [31, 32].

ProMIS: Five studies established construct validity for the following endpoints: time [13, 3336], path length [13, 3436], smoothness of movement [13, 3436] (level 2 recommendation) and number of hand movements [33] (level 3 recommendation) when comparing novices versus experts [13], novices versus intermediates versus experts [34, 35] or medical students/preregistration house officers (PRHOs) versus senior house officers versus surgical trainees versus consultants [33, 36] in various laparoscopic bench tasks. The tasks included suturing [13], orientation [33, 34, 36], object positioning [34, 36], knot tying [34] and sharp dissection [3436].

ROVIMAS: One study confirmed construct validity for the following endpoints: time (overall, stage 1, 2 and 3), number of hand movements (stage 1) and path length (stage 1), when comparing novices and experts in a real-life LC [18] (level 3 recommendation). Number of hand movements and path length were both unable to distinguish between novices and experts in clipping and cutting the cystic duct (stage 2) and artery (stage 3), or during dissection (stage 4) [18].

Other validity types

Face validity was reported by one study for ADEPT [27] and one study for ICSAD [9] (no data provided). Three studies reported concurrent validity [9, 27, 35]. Macmillan et al. state that for ADEPT a high correlation was seen between the number of perfect runs and blinded clinical assessments (Spearman’s rho 0.74) [27]. Concurrent validity was also confirmed by one study for ICSAD [31], and ProMIS [35], through the observation that motion analysis metrics correlated with expert and global rating scores (ICSAD: path length, Spearman’s rho –0.78, p = 0.000; ProMIS: time and path length, Spearman’s rho 0.88, p < 0.05) (all level 3 recommendations). None of the 13 studies included in this systematic review investigated content or predictive validity.

Discussion

This study presents the evidence for the use of motion analysis in laparoscopic skills assessment. A previous review by van Hove et al. [15] assessed a range of objective tools available to assess surgical skill, including motion analysis. However, this did not provide information regarding the precise surgical skill assessed, nor did it provide subsequent levels of recommendation. The authors included studies validating the TrEndo Tracking System, which so far has only been studied in obstetrics and gynaecology trainees [37, 38]. These studies have produced promising results, and we recommend further studies investigating its application within general surgery. Carter et al. [26] published consensus guidelines concerning evidence rating and subsequent levels of recommendation for evaluation and implementation of simulators and skills training programmes [25, 26]. The authors produced an alternative system due to the absence of published validation studies that have rigorous experimental methodology [26]. Our review utilises this version of the CEBM system, and actual levels of recommendation for each tool have been provided for the first time (Table 6).

This review reports construct validity for a range of different motion analysis metrics across three different training environments (VR [13, 3336], laboratory based [9, 2832] and the operating theatre [18]). The most commonly validated metrics were time to complete a task, path length and number of hand movements. One ICSAD study attempted to establish construct validity for velocity during a simulated porcine LC model [9]. Velocity is a function of time and path length, both of which were also measured. However, while velocity was found to largely lack construct validity, this was not the case for time and path length. Smith et al. explain this by stating that each movement made by experienced surgeons is more efficient, meaning that, while the speed of movements is not significantly quicker, instead they are more goal directed so that tasks are completed in less time [9]. Despite only being a metric measured by the ProMIS simulator, smoothness of movements was also consistently shown to discriminate between different levels of experience [13, 3336].

Aggarwal et al. [39] state the importance of breaking down training and assessment into basic, intermediate and advanced stages. It could be suggested that ADEPT and HUESAD could be used to assess basic training as they utilise simple orientation and movement skills in a non-anatomical environment. ICSAD, ProMIS and ROVIMAS could be used to assess intermediate competence. There are animal tissue models and virtual reality simulators that exist for a range of general surgery procedures that could be used in conjunction with these motion analysis technologies. This has already been demonstrated in a porcine model for LC [9], and adaptations to the devices may enable their use in endoscopy training. The flexibility of use offered by ICSAD and ROVIMAS means that advanced competency could be assessed. Construct validity during a real-life LC has already been demonstrated for ROVIMAS [18].

This systematic review also showed that very few forms of validity are being examined apart from construct. The more forms of validity that are demonstrated, the stronger the overall argument for the use of a particular technology [20]. While two studies report face validity [9, 27], no expert rating data were provided to support this. It may not be possible to face-validate motion analysis technology, as any attempt to do so would be assessing the realism of the laparoscopic set-up instead. While it is important to establish construct validity for each endpoint and in every procedure that motion analysis may eventually be used to assess, its practical use in real-life assessment is limited. Predictive validity represents a more useful modality to investigate, and it is unfortunate that there have been no studies undertaken to investigate this.

The main limitation of this review is the degree of methodological variation between included studies, which prevented meta-analysis. The largest degree of variation was seen for group allocation, which was largely based on career grades, although most studies used further inclusion criteria within each grade based on varying levels of laparoscopic experience. This limitation is explained by the fact that number of procedures performed is a non-objective measure of experience. A more objective approach to group allocation could have been made on the basis of Objective Structured Assessment of Technical Skills (OSATS) scoring. A further limitation is that the majority of the studies included compared groups across wide ranges of experience (e.g. novice versus intermediate versus expert), where outcomes may be largely dependent on the novice versus expert element of this analysis. Motion analysis must demonstrate the sensitivity to discriminate between all individual grades if it is to be used to assess laparoscopic competence.

Motion analysis does carry some limitations which require discussion. Firstly, many of the devices require calibration to account for individual physiological tremor. This may require technical support during each procedure. Additionally, there is the issue of cost, which may prevent widespread use across all training centres.

In order for motion analysis to be used as an assessment tool it must be shown to work in a real-life environment. While the feasibility of using motion analysis in a real-life operating theatre has been demonstrated for ICSAD [40] and ROVIMAS [18], the correlation between motion analysis assessment in the laboratory and its subsequent use within the operating theatre needs to be evaluated. Quantitative assessment outcomes must be shown to be equivalent between different training environments, otherwise the application of motion analysis to provide trainee feedback is undermined.

Using motion analysis in isolation may remove the user from the context of the operating theatre. As surgical competence is multimodal, it is important that assessment is not only based on specific outcomes (such as dexterity) but also global outcomes, such as task accuracy and outcome. This is made possible through the dual application of motion analysis alongside global checklists [e.g. Global Operative Assessment of Laparoscopic Skills (GOALS) and Objective Structured Assessment of Technical Skills (OSATS)] [41]. Furthermore, procedure-specific rating scales have also been developed to assess specific technical aspects of different operations, including LC [42] and Nissen fundoplication [43]. Using these systems, assessment can either occur “live”, whilst a trainee is undertaking a specific task [44]. Several studies included in this review included global rating scores, which were found to correlate with motion analysis metrics [18, 31, 35].

It has been suggested that surgery is 75 % decision-making and 25 % dexterity [45]. While motion analysis may provide a promising tool to assess dexterity, it cannot provide information on the numerous attributes that contribute to the other three-quarters of a good surgeon’s skill set. Further work is needed to correlate motion analysis against similarly validated measurements of surgical decision-making in different scenarios.

Conclusions

We have demonstrated that there is evidence validating the use of motion analysis to assess laparoscopic skill. The most valid metrics appear to be time, path length and number of hand movements. More work is needed to establish predictive validity for each of these metrics. Future work should concentrate on the conversion of motion analysis data into competency-based scores or indices for trainee feedback.