Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

FormalPara Take-Home Messages
  • The definition of standardised benchmarks is required to define arthroscopic competency.

  • Measuring surgical performance comes with challenges, but new developments such as affordable tracking systems and video analysis software can facilitate structural implementation.

  • Objective monitoring of resident learning curves is feasible using global rating scales.

  • ASSET and BAKSSS global rating scales are validated most extensively and suggested to be used in clinical practice, where ASSET offers potential for summative assessment of arthroscopic skills.

1 Introduction

Although previous chapters indicated the potential and benefits of training arthroscopic skills in simulated environments, training needs to be continued in the operating room to achieve the necessary proficiency. Based on the theory on learning strategies in Chap. 4, it is posed that if residents indeed acquire the basic skills before they enter the operating room, the focus in the operating room can be on more complex tasks. This requires the formulation of guidelines that determine the level that qualifies proficiency. For the actual cases in the operating room, this is a difficult task as the level of complexity of the procedure plays an important role, and proficiency is not necessarily defined as the summation of several part-task skills, but rather requires a holistic approach.

Generally, the complexity of an arthroscopy is divided in two levels: basic (removal) and advanced (reconstruction), e.g. meniscectomy vs. anterior cruciate ligament (ACL) reconstruction (Morris et al. 1993; O’Neill et al. 2002). For elbow arthroscopy, five levels of complexity have been defined (Savoie 2007). To cope with the complexity and support the holistic judgment, faculty members from recognised institutions that have performed a substantial number of procedures (>250) themselves qualify to judge proficiency (Morris et al. 1993; O’Neill et al. 2002) – a method that is being applied in many residency curricula. Despite arthroscopy being performed frequently, consensus is to be attained on the exact definition of arthroscopic competence and the number of procedures that are required to achieve it (Hodgins and Veillette 2013; O’Neill et al. 2002).

As little to no evidence is available on transfer validity of arthroscopic simulator training, and many residency curricula have yet to implement simulator training, the first section focuses on measuring surgical performance in the operating theatre. Measuring surgical performance is not only useful in training, but has also direct applications in quantification and monitoring of operative quality, patient safety and workflow optimisation. Tools and methods are presented from these areas. These could be applied to verify proficiency in basic arthroscopic skills. Additionally, work is presented to set reference baselines for comparing surgical performance.

As mentioned, training in the OR consists of the apprentice model, where the resident initially watches the teaching surgeon performing an operation and gradually takes over (Pedowitz et al. 2002). As modern medicine offers reduced time for residents to develop their arthroscopic skills, it is worthwhile to optimise the learning effect per operation. General educational theories indicate that feedback on one’s performance and stimulation of active learning contributes significantly to a more effective learning process (Prince 2004). For surgery, it has been demonstrated that direct feedback on performance improves the resident’s individual skills (Harewood et al. 2008; O’Connor et al. 2008). We present tools that are suitable to monitor this form of teaching and respect the holistic judgment model needed to assess the more complex tasks.

2 Measuring Surgical Performance and Baseline References

Measuring surgical performance is not an easy task, as patient care has number one priority, patient privacy and the sterile operating zone should be respected, and the operating theatre cannot be transformed into an experimental set-up. Besides, interpretation of the data is complex. That is why attention is paid as well to the registration of baseline reference data of procedures currently performed in the operating theatre. Two categories of tools are defined: sensors that can measure psychomotor skills similarly as done in simulated environments and video and audio registrations that can capture overall surgical performance. Each is elucidated with examples.

2.1 Sensors

The first parameter to be discussed is not surprisingly the operation time. It is easy to measure and often used to track operative planning and workflow. Its value is deducted from the well-established fact that experts execute surgical actions more efficiently compared to novices (Bridges and Diamond 1999). Farnworth and co-workers demonstrated that residents are significantly slower in performing ACL reconstructions compared to orthopaedic surgeons, which can also have financial consequences (Farnworth et al. 2001).

Psychomotor skills can also be monitored in the operating theatre by motion-tracking systems. Such systems exist using (infrared) cameras that track optical or reflective markers attached to the hands of the surgeon or the instruments or of electromagnetic systems with active markers. In surgical practice, such tracking systems are commonly used in computer-aided surgery for accurate positioning of orthopaedic implants (Fig. 13.1) (Matziolis et al. 2007; Moon et al. 2012; Rosenberger et al. 2008). Tracking can also be performed with normal video cameras and digital image-processing tools that recognise markers or other features in the image. Examples are presented by Doignon and co-workers (Blum et al. 2010; Doignon et al. 2005) who detected surgical instruments in the endoscopic video based on metal-coloured features of the system and by Bouarfa and co-workers who labelled various instruments with coloured markers at the tip to improve robustness (Fig. 13.2) (Bouarfa et al. 2012). Tracking of instrument motions provides insight in surgical performance and flow of the procedure (Aggarwal et al. 2007; Dosis et al. 2005). It does require careful data interpretation.

Fig. 13.1
figure 1

Example of an infrared camera tracking system used in combination with passive reflective markers. (a) Infrared camera. (b) Two markers attached to the shaft of (c) The arthroscopic punch. (d) Anatomic bench model of the knee joint (© GJM Tuijthof, 2014. Reprinted with permission)

Fig. 13.2
figure 2

Example of real-time in vivo instrument tracking using coloured labels attached to instruments. In this example three instruments are tracked simultaneously (Bouarfa et al. (2012), copyright © 2012, Informa Healthcare. Reproduced with permission of Informa Healthcare)

Another set of parameters that have been measured in the operating room are the forces and torques executed during knee arthroscopy (Chami et al. 2006). Chami and co-workers showed that force parameters can indeed discriminate between novices and experts (Chami et al. 2008).

2.2 Video and Audio

Video recordings of a procedure could offer a tool which allows a holistic type of feedback with easy interpretative illustrations. However, the few studies that we could find on using video feedback to improve surgical training did not find significant differences (Backstein et al. 2004; Backstein et al. 2005). Drawbacks of using video recordings are that the replay of an entire operation is time-consuming and without post-processing they do not provide objective measures. A similar line of reasoning can be given for audio recordings. Still, when executing post-processing techniques, video and audio recordings reveal useful cues that could be used to monitor surgical performance. We present some examples related to arthroscopic training.

Time-action analysis is a quantitative method to determine the number and duration of actions. It represents the relative timing of different events and the duration of the individual events. In the medical field, time-action analysis has proven its value in objectifying and quantifying surgical actions (den Boer et al. 2002; Minekus et al. 2003; Sjoerdsma et al. 2000). For training, patient safety and workflow monitoring, time-action analysis can be used to detect and to analyse deviations from the normal flow of the operation. This requires documentation of reference data sets through analysis of procedures performed by expert orthopaedic surgeons. We have performed such analyses for a set of predominantly meniscectomies with the intended purpose of investigating the effectiveness of arthroscopic pump systems (Tuijthof et al. 2007, 2008). To do so, the operations were divided into four phases – (1) creation of portals, (2) joint inspection with or without a probe, (3) cutting and (4) shaving – and their share in the operation time was quantified with the time-action analysis. Comparing the mean duration of each of the phases with those of a trainee can indicate if the trainee performs according to normal workflow or needs substantially more time for a certain phase. By analysing the number of instrument exchanges, repeated actions or the percentage of disturbed arthroscopic view as well, trainees can receive detailed objective feedback on the skills they need to improve. Other parameters that were analysed are the prevalence of instrument loss, triangulation time and prevalence of lookdowns, which showed a high correlation with global rating scale and motion analysis (Alvand et al. 2012).

As these early time-action analyses initially were performed manually by replaying the video frame by frame (den Boer et al. 2002; Minekus et al. 2003; Sjoerdsma et al. 2000; Tuijthof et al. 2007, 2008), implementation of this method for training purposes is unrealistic as it is too time-consuming. However, efforts have been made to perform such analyses automatically using image-processing techniques (Doignon et al. 2005; Tuijthof et al. 2011) or specific tracking systems (Bouarfa et al. 2012). When combined with statistical models, such as Markov models, one can even predict peroperatively what the flow of the operation is (Bouarfa et al. 2011; Bouarfa and Dankelman 2012; Padoy et al. 2012). Such methods could lead to tools that provide real-time objective feedback to a trainee during the operation.

Another feasible approach to implement time-action analysis techniques for training purposes is derived from training of high performance athletes. In this field, it is becoming a daily practice that training activities are recorded on video. To cope with the huge amount of data, sports analysis video software has been developed, which makes it easier to tag events, to assign event to categories, to make annotations and to perform quantitative analyses. Examples of commercial video analysis software packages are Utilius (CCC software, Leipzig, Germany, www.ccc-software.de), MotionViewTM (AllSportSystems, Willow Springs, USA, www.allsportsystems.com) and SportsCode Gamebreaker Plus (Sportstec, Sydney, Australia, www.sportstec.com). We present an example of applying such software for the analysis of verbal feedback during arthroscopic training in our university hospital. During supervised training of arthroscopy, verbal communication is mainly used to guide the resident through the procedure. This suggests that the training process can be monitored through verbal communication. To investigate if current training in the operating room involves sufficient feedback and/or questioning to stimulate active learning, verbal communication was objectified and quantified.

Within a period of two times 3 months, 18 arthroscopic knee procedures were recorded with a special capturing system consisting of two video cameras – one from the arthroscopic camera and one of the hands of the residents (digital CCD camera, 21CW, Sony CCD, Tokyo, Japan) – and a tie-clip microphone (ECM-3003, Monacor, Bremen, Germany) that was mounted on the supervising surgeon. The video images were combined by a colour quad processor (GS-C4CQR, Golden State Instrument Co., Tustin, USA) and digitised simultaneously with the sound by an A/D converter (ADVC 110, GV Thomson, Paris, France). Four residents who were supervised by either one of two participating surgeons performed the operations. Communication events were tagged with Utilius VS 4.3.2 (CCC-software, Leipzig, Germany) and assigned to categories for the type and content of communication (Fig. 13.3). Four communication types were adopted from Blom et al. (2007): explaining, questioning, commanding and miscellaneous (Table 13.1). As this study specifically focuses on training, one category was added, feedback, which reflects the judgment of the teaching surgeon on the actions of the resident. Six categories for communication content were defined as follows: operation method (that has an accent on steps that have to be taken in the near future e.g. start creating the second portal), anatomy and pathology, instrument handling and tissue interaction (e.g. open punch, reposition instrument, stress joint, increase portal size, push meniscus backwards), visualisation (e.g. move scope, irrigation, focus), miscellaneous (general or private) and indefinable (Table 13.1). The frequency of events as percentage of total events in each of the categories was determined (Table 13.1). A multivariable linear regression analysis was performed to determine if the teaching surgeon and the experience of the residents significantly influenced the frequency of communication events per minute (p < 0.05).

Fig. 13.3
figure 3

Screenshot of software used to analyse verbal communication (© GJM Tuijthof, 2014. Reprinted with permission)

Table 13.1 Crosstabs for type (upper row) and content (left column) categories as percentage of total events

On average 6.0 (SD 1.8) communication events took place every minute. The communication types explaining and commanding show a considerable frequency compared to questioning and feedback (Table 13.1). The explaining events were primarily on anatomy and pathology followed by instrument handling and tissue interaction. The commanding events were primarily on instrument handling and tissue interaction and visualisation, which in general were the most frequent communication content categories (Table 13.1). A difference in mean events per minute was found between both teaching surgeons (p < 0.05). No significant correlation was found between the frequency of events and the experience of the residents.

The results highlight distinctive communication patterns. The relative high frequency of the types explaining and commanding as opposed to questioning and feedback is noticeable as the latter two stimulate active learning in general. Additionally, explaining on the contents anatomy and pathology and instrument handling and tissue interaction is considerable. These items are particularly suitable for training outside the operating room. If trained so, more options are left to focus on other learning goals. As a clear difference was present between the frequency of events per minute amongst the surgeons and no correlation was found for the experience of residents, we cannot confirm that this method is suitable as an objective evaluation tool for new training methods. Additional research is recommended with a larger group of residents to minimise the effect of outliers.

3 Monitoring Complex Tasks and Assessing Learning Curves

To respect the holistic assessment model, expert surgeons are needed to assess the more complex tasks. This type of assessment is sensitive to the subjective opinion of the assessor, which might compromise fair judgment (Mabrey et al. 2002). To overcome this issue, education theories recommend the formulation of rubrics, which describe clear evaluation criteria and various levels of competence. In surgical training, such rubrics are called global rating scales (GRS). The GRS suggested that arthroscopic skills will be elucidated as well as their validation and examples to assess learning curves.

Within this section, we loosely follow Hodgins and Veillette who reviewed assessment tools for arthroscopic competency (Hodgins and Veillette 2013). Recently, various GRS have been developed specifically for structured, objective feedback during training of arthroscopies (Table 13.2):

Table 13.2 All GRS that are suggested for rating of arthroscopic skills based on Hodgins and Veillette (2013)
  1. 1.

    Orthopaedic Competence Assessment Project (OCAP) (Howells et al. 2008)

  2. 2.

    Basic Arthroscopic Knee Skill Scoring System (BAKSSS) (Insel et al. 2009)

  3. 3.

    Arthroscopic Skills Assessment (ASA) (Elliott et al. 2012)

  4. 4.

    Objective Assessment of Arthroscopic Skills (OAAS) (Slade Shantz et al. 2013)

  5. 5.

    Arthroscopic Surgery Skill Evaluation Tool (ASSET) (Koehler et al. 2013)

The actual forms are available in Appendices 13.A, 13.B, 13.C, 13.D and 13.E. Noticeable is that all arthroscopic GRS except for ASA have a similar structure with 7–10 items that need to be scored on a 5-point Likert scale. At least 3 of 5 points are explicitly described, which should help uniform assessment. Also many of the items are similar, such as instrument handling, flow of operation, efficiency and autonomy. OCAP and BAKSSS are also recommended to be used with task-specific checklists, whereas ASA solely focuses on knee arthroscopy with such a checklist. Analysing these GRS, one can conclude that a certain level of consensus exists on arthroscopic skills that a resident should be able to demonstrate in the operating theatre and the required level to qualify as competent.

OCAP is not specifically tested, but its items are derived from the well-established OSATS GRS, which has been validated extensively (Martin et al. 1997; Reznick et al. 1997). The four other GRS have been validated for construct, content and concurrent validity as well as internal consistency, interrater and test-retest reliability (Table 13.2). The results indicate that they meet the requirements and show a high correlation with year of residency. Notice that none of the study designs for validation are the same, thus one-to-one comparison is not possible. The ASSET has also been evaluated for summative assessment in a pass-fail examination, which was confirmed with a high rater agreement (ICC = 0.83) (Koehler and Nicandri 2013).

For OCAP and BAKSSS, we determined if they reflect the learning curve during arthroscopic training in the operating room and what their discriminative level is. 75 arthroscopic procedures performed by 15 residents in their fourth, fifth and sixth year of their residency were assessed by their supervising surgeon.

Pearson correlation coefficients were calculated between year of residence and normalised sum scores of both GRS questionnaires. The normalised sum score consisted of all points scored on each of the items normalised to a 100-point scale. The Pearson correlation was significant for BAKSSS (R = 0.73) and for OCAP 0.70 (R = 0.70). A linear regression analysis demonstrated a significant increase of the GRS sum score of 9.2 points (95 % CI 6.2–12.1) for BAKSSS and 9.5 points (95 % CI 6.5–12.5) for OCAP. The results lead to our conclusion that both GRS are suitable to monitor overall arthroscopic skills progression in the operating theatre.

Now that the tools for monitoring surgical performance in the operating theatre are summarised, this section focusses on the application of these tools to assess learning curves. As the number of studies is quite limited all are briefly described. The learning curve of arthroscopic rotator cuff repair was determined using operation time as metric (Guttmann et al. 2005). Using blocks of ten operations for comparison, a significant decrease in operation was determined between the first two blocks, but not for consecutive blocks. This indicates that learning took place in the first ten procedures. The learning curve for hip arthroscopy is determined by measuring the operation but also by determining the complication rate (Hoppe et al. 2014). Improvement was seen between early and late experience with 30 patient cases as being the most common cut-off. A similar study design was used to assess the learning curve for arthroscopic Latarjet procedures, which showed a significant decrease in operation time and complication rate between the first 15 patient cases and the consecutive 15 patient cases (Castricini et al. 2013). Van Oldenrijk and co-workers, who used time-action analysis to assess a learning curve for minimally invasive total hip arthroplasty, found that learning took place in the first five to ten patient cases (Van Oldenrijk et al. 2008). This was quantified by the number of repetitions, waiting and additional actions executed during the operation.

4 Discussion

In this chapter, monitoring tools to measure surgical performance and training progression were presented. Operation time is easy to measure and as shown capable of reflecting learning curves. Still, using the operation time as a measure for training purposes is less useful, since it does not give clues for the trainee on what to improve, and it reflects many more factors than the surgical performance such as the complexity of the patient case. This is also acknowledged in the global rating scales. The tracking systems that have been used on research studies are quite expensive and require preoperative installation and calibration, which could explain the absence of studies performed in the operating room to determine learning curves. However, in the entertainment and gaming industry, motion-tracking developments are growing fast, from which the surgical training field could benefit. For example, Wii controllers are affordable and their accuracy is continuously being improved. Measuring of forces as presented by Chami requires a specific measurement set-up and modification of the instruments (Chami et al. 2008). Furthermore, attention needs to be paid on the manner of feedback using force parameters as the feedback should make sense for the trainee. Overall, these metrics are used in simulated environments and are strong in monitoring confined less complex tasks or actions. However, video monitoring seems to reflect the required holistic judgment model needed to assess more complex cognitive tasks. The challenge is to cope with the huge amounts of data that video registration gives. In that perspective, automatic detection with image-based tracking algorithms would be a perfect alternative tool as the arthroscopic view is available anyhow. However, until now these algorithms lacked robustness due to continuous changing lighting conditions in the view. With this feature perspective, video analysis software as applied in athlete training might be a good alternative at short notice, especially if supervising surgeons define critical phases of the procedure that will be the focus of the learning experience, since this would limit the video recordings to those events solely. A major advantage of video analysis is that it can provided highly comprehensive feedback to the trainee. Another alternative is the use of global rating scales. These scales structure and objectify the feedback of the supervising surgeons, but cannot be so illustrative as video feedback. Furthermore, it is recommended that assessors using the scales are trained to attain uniform assessment. However, they are truly easy to implement in residency curricula, have been demonstrated to reflect the learning curve of residents and could also be used for self-assessment. Summarizing, quite some tools have been presented, and validation of GRS for arthroscopic skills has been performed. This offers feasible tools to continue arthroscopic skills monitoring in an objective, structured and comprehensive manner that is formative assessment. Still more research is required to determine which of the tools could be used for summative assessment.