Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

FormalPara Take-Home Messages
  • Thresholds for time-, motion- and force-based metrics are required to facilitate training and to set uniform standards for assessment.

  • Thresholds can be derived from theoretic calculations, tissue experiments or from measurements with experts.

  • Specific research is required to determine evidence-based sets of thresholds that can be used for training.

1 Definitions

As discussed in Chap. 11, assessing performance is key to guide and monitor training and progression. Apart from measuring objective metrics, thresholds need to be determined that represent proficiency. To avoid discussions, the following definitions are made:

  • Task or exercise is a combined set of necessary (arthroscopic) actions to achieve the goal as requested by the task.

  • Proficiency in terms of instrument handling is defined as the optimal combination of performance efficiency and safety (Chap. 11).

  • Threshold is the magnitude or intensity that must be exceeded for a certain condition to occur or be manifested; but a thresholds means also the maximum level of magnitude considered to be acceptable or safe (Oxford English Dictionary 2014).

  • Tissue damage is defined as macroscopically visible tearing or rupturing of tissue.

2 Introduction

In this section of the book, we still focus on simulator training, that is, training outside the operating room on any type of simulated environment using any of the presented metrics. This chapter is a directly related to Chap. 11, since performance tracking is less useful if no clear indications can be given to the trainees if and when they have achieved proficiency to continue the next phase of training. To feed this information back to the trainee without frequent supervision of teaching staff, thresholds need to be set for the objective performance metrics. With this, we leave the domain of simulator validation and enter the domain of task design and validation. Similar to determining metrics that best reflect a certain task performance, determining complementary thresholds is a tedious task for several reasons. First, tasks need to be precisely defined by decomposing them in smaller elements, whereas in actual performance of arthroscopy several approaches can usually be applied without affecting the surgical outcome. An example is the presence of various techniques to execute meniscus suturing (Cho 2014; Forkel et al. 2014; Ra et al. 2013) or approaches to access the shoulder joint (Meyer et al. 2007; Soubeyrand et al. 2008). Second, some thresholds, such as task time, depend on the task, which requires them to be determined per task. This was, for example, done by Schreuder and co-workers who evaluated all five exercises available on a VR simulator for training of laparoscopic skills (Schreuder et al. 2011) with complementary metrics for each specific exercise. Third, sometimes it is difficult to determine the optimal performance efficiency, which is required to set thresholds. Finally, when using thresholds for direct feedback settings, care has to be taken how to inform the trainee and how to prevent mental overloading.

Nevertheless, determination of evidence-based thresholds highly supports the availability of validated simulator training curricula that offer exercises that truly discriminate between levels of experience. Eventually, this supports uniformity in performance tracking and objective definition of levels of proficiency. This could lead to summative testing of innate arthroscopic skills of future residents before being accepted into a residency programme (Alvand et al. 2011) and of basic arthroscopic skills to qualify for continued training in the operating room. Two methods are presented to determine thresholds for different types of metrics and illustrated with examples.

3 Theoretic Thresholds

The term theoretic indicates the possibility to calculate the ideal or at least the extreme magnitude or setting of a metric for a given task. This method is widely applied in robotic control, for example, when a robot arm needs to move via the shortest trajectory from location A to location B or within the fastest possible time. The terms shortest and fastest indicate the extreme of the magnitude calculated with the shortest trajectory from location A to location B, the dimensions and degrees of freedom of the robotic arm as well as positions in space of the locations of A and B are assumed to be known. In the remainder of this section, several examples are given of theoretic thresholds that can be derived both for performance efficiency and performance safety metrics. This illustrates how this approach can be applied for training of arthroscopic skills.

4 Idle Time, Out of View Time and Motion Smoothness

Idle time can be used as metric if a threshold is set that defines ‘still’. Its theoretic threshold is easily derived by demanding that the instrument tip never remains in one freeze position during task execution or demanding that the instrument tip motion speed never is zero for a certain time. Similarly, the theoretic threshold for out of view time can be derived to be zero as well. This implies that the position of the instrument remains always in the view cone of the arthroscope (Fig. 11.4). Finally, another easy to derive theoretic threshold is that of motion smoothness which is zero, as this requires the instrument to show no changes in its motion acceleration. The theoretic determined thresholds for these three metrics are determined independently of a certain task.

5 Path Length

To demonstrate how the metric path length can be determined, we use a simplified navigation task in this example. Suppose that for this navigation task, it is required to navigate and probe five anatomic landmarks: medial tibia plateau (1), posterior horn of the medical meniscus (2), midsection of the anterior cruciate ligament (3), lateral tibia plateau (4) and posterior horn of the lateral meniscus (5) (Fig. 12.1). We assume that these five landmarks are located in a single plane. Subsequently, the shortest total path length (smin) to probe all landmarks in the predefined sequence can be calculated:

Fig. 12.1
figure 1

Cross-sectional view of a knee joint showing the lateral and medial menisci, the anterior cruciate ligament zone (grey area) and the portals. The numbered bullets indicate the five landmarks that need to be probe for the set navigation task in the indicated sequence. The dotted line represents smin, which is the minimal path length of the trajectory to probe the landmarks

$$ s \min ={\displaystyle \sum}_{i=1}^s\sqrt{{\left({x}_{i+1}-{x}_i\right)}^2+{\left({y}_{i+1}-{y}_i\right)}^2} $$

where the x i and y i are the coordinates of each of the five landmark positions in the plane. Smin is the absolute minimal path length for the given trajectory. This means that there is no other option to following this trajectory in an even shorter manner. So, trainees can be requested to exactly follow this trajectory with the tip of their instrument to execute this particular task. This example illustrates the task dependence of the set threshold, since another navigation task can give another magnitude of smin.

6 Force Magnitude

Safe tissue manipulation was associated with force magnitudes used to load tissues (Chap 9). It was stated that tissue damage occurs if the tissue is loaded beyond the tissue’s material strength. Material strength is a tissue material property that indicates the failure level. This failure property will be used to determine theoretic thresholds for two types of tissues: meniscal and ligamentous tissue. Setting a threshold for safe meniscus probing is relevant to stimulate safe manipulation of this delicate tissue, since it has little to no healing potential (Tuijthof et al. 2011). Setting a threshold for safe ligament loading is relevant for arthroscopic training; the lower leg is stressed during knee arthroscopies to increase the available joint space. Ligament failure can be prevented if maximum loading levels are not exceeded (Stunt et al. 2013). Calculation of the force magnitude is only possible if tissue material properties, volume and their contact areas with instruments are known. If not, tissue properties should first be determined from experiments (e.g. (Tuijthof et al. 2009)). Additionally, tissue measurements and observation studies are required to determine the manipulated tissue’s cross-sectional area’s and contact surfaces.

All tissues, thus meniscal and ligamentous tissue as well, present a viscoelastic behaviour with a nonlinear relation between force and displacement (Buchner 2009; Chmarra et al. 2006; Fithian et al. 1990; Hull et al. 1996; Kennedy et al. 1976; Robinson et al. 2005). When loading the tissue, the tissue starts to deform elastically, followed by plastic deformation. Finally, when the load exceeds, the material’s failure property either pure shearing or tearing causes tissue to rupture (Tuijthof et al. 2011). To set the theoretic thresholds, the variation in tissue material properties amongst the human population needs to be taken into account. The aim is to set force magnitude thresholds that prevent damaging even the weakest tissue when performing tissue manipulation. Consequently, the failure property of these weakest tissues should be determined, which is derived by subtracting three times the standard deviation from the mean failure property (Tuijthof et al. 2009). This should cover 99 % of the normal human population. Subsequently, the minimum force is determined to actually rupture the weakest tissues using values from tensions studies performed with human cadaver material (Kennedy et al. 1976; Robinson et al. 2005; Trent et al. 1976; Tuijthof et al. 2009). A threshold value of 8.5 N has been derived for probing of meniscus tissue (illustrated in Fig. 11.7) (Tuijthof et al. 2011), and a threshold value of 78 N has been derived for stressing the lower leg at the level of the ankle joint (Stunt et al. 2013). Thus, remaining below these theoretic threshold levels minimises the chance to damage tissue unintentionally.

7 Expert Thresholds

Another approach to set thresholds for performance metrics is using values acquired from experts performing tasks in the simulated environment. The line of reasoning supporting this approach is that experts have reached the plateau in their learning curve and demonstrate proficiency in arthroscopic skills. Thus, their task performance reflects the optimal manner to execute that particular task. To document reliable data, experts should have gotten the opportunity to familiarise themselves with the simulated environment and the task, and their number should be sufficiently large to minimise the influence of outliers.

Even with these preconditions taken into account, there is room for subjective selection of the threshold levels, e.g. the mean value, the mean added or subtracted with n times the standard deviation, the median, minimum or maximum values of the expert data sets. In the remainder of this section, several examples are given of expert thresholds that can be derived both for performance efficiency and performance safety metrics. This illustrates how this approach can be applied for training of arthroscopic skills.

8 Performance Efficiency Metrics

Task time (t), path length (s) and economy of motion (em) were the performance efficiency metrics for which we found expert data sets (Tables 12.1 and 12.2). These expert data sets were not the goal of these studies but were acquired to assess construct validity of arthroscopic knee and shoulder simulators. Nevertheless, these are the only sets from which quantitative thresholds can be derived.

Table 12.1 Experts threshold levels determined for tasks and performance efficiency metrics of knee simulators
Table 12.2 Experts threshold levels determined for tasks and performance efficiency metrics of shoulder simulators

The process how the expert data were utilised to form both tables is elucidated. Only tasks were included that were explicitly described. If possible, only the last trial in a series of repetitive trials was processed, to minimise possible bias due to familiarisation. Only expert data were included that gave significantly different results compared to less experienced groups. If the same tasks were performed on different simulators or investigated in multiple studies, the results were pooled as follows. The mean values of each metric were calculated by the weighted mean using the relative number of experts per study as weighing factor. These mean values (μ) are presented as a first possible threshold (Tables 12.1 and 12.2 all but last column). Subsequently, the largest standard deviation (or lowest 95 % confidence level) of each metric was selected to define a second possible threshold: mean value subtracted by the standard deviation (μ-σ) (Tables 12.1 and 12.2 last column). Subtraction of the standard deviation was used, this results in lower threshold values, which implies that trainees need to demonstrate increased performance efficiency.

When analysing the tables, the following remarks can be made:

  • The number of experts is limited and inconsistently defined in the studies.

  • The number of tasks is limited to predominantly navigation and probe tasks.

  • The order of magnitude of the task times and path lengths is quite similar for the navigation tasks, which implies a certain level of consistency.

9 Performance Safety Metrics

9.1 Experimentally Defined Thresholds

As alternative for calculating the force magnitude based on known tissue properties as described in the previous paragraph, force parameters that represent tissue damage can also be determined based on tissue measurements. This is especially useful when there are too many unclear factors that prevent reliable calculation of the force magnitude threshold. Especially when the conditions during loading are relatively constant (e.g. knowing the grasping surface of a grasper or the contact area between needle and tissue during suturing), it is possible to mimic the surgical action for multiple tissue samples in a test setup to measure the maximal loading force before tissue rupture(Heijnsdijk et al. 2004; Rodrigues et al. 2012). By taking enough tissue samples from multiple individuals, the combined factors of influence are considered as ‘black box’, while statistics are used on the measurement outcomes to find the maximal allowable force for force critical surgical action as drilling, suturing or tissue handling. According to the known literature, this approach was not used yet to determine the maximal allowable force magnitude for arthroscopic tissue structures.

9.2 Thresholds Derived from Literature

Following the same process as executed to form the tables of the performance efficiency thresholds, a table with performance safety thresholds was made including the metrics: collisions (col), motion speed (v), force magnitude (F) and force area (mfa) (Table 12.3). Two aspects are different compared to Tables 10.1 and 10.2, which are elucidated.

Table 12.3 Experts threshold levels determined for tasks and performance safety metrics of knee simulators

First, the last column contains values where the standard deviation was added to the mean value to a second threshold. This results in threshold values that seem to be less strict in terms of defining safe tissue manipulation. However, as the values are from experts, we can argue that these levels should be safe. Second, two studies were performed with the goal to determine safe manipulation thresholds for meniscal and ligamentous tissue, which was determined in vitro and in vivo (Stunt et al. 2013; Tuijthof et al. 2011). When analysing Table 12.3, the same remarks can be made as presented for the performance efficiency metrics, except that the suggested levels for tissue probing and joint stressing are based on a higher level of evidence.

10 Discussion

Two methods were presented to derived evidence-based thresholds for training of tasks in simulated environments: the theoretic and experimental expert approach. Examples were given how to determine theoretic thresholds for both efficiency and safety metrics, for which the latter requires knowledge on material properties of human tissue. Data of experts from which thresholds could be derived is marginally available in literature (Tables 12.1, 12.2 and 12.3). Both methods have pros and cons, with theoretic thresholds being too strict at times and expert-derived thresholds requiring still a subjective decision which level to use. Therefore, it is suggested to combine both methods to set realistic and evidence-based thresholds. Two examples are given.

The application of the theoretic threshold for path length (smin) might be too strict, since no deviation from smin is allowed, which is almost impossible to achieve. This could evoke unrealistic or undesired performance behaviour to achieve task completion, such as extreme slow movement of the probe. Also, it could cause frustration as trainees find it impossible to achieve the required threshold and might get demotivated to continue training. So, rather than using such ‘extreme’ theoretic threshold, its magnitude can be used as a starting value to set a threshold which is defined by faculty or can be used to decide which expert values (mean, mean added or subtracted with standard deviation) too be used. Additionally, if expert and theoretic threshold values deviate too much, further analysis could highlight performance strategies which not necessarily strive to minimise a certain metric such as path length (see, e.g. (Chmarra et al. 2006)). This could lead to the adjustment of a certain task and the choice to use other metrics or to use only expert data to set thresholds.

Contrarily, the application of the theoretic threshold for safe meniscal tissue probing could be used as the absolute maximum value that a trainee might use. Ideally this should be supported by force measurements executed during experiments with real instruments and tissue or by expert data who show probing levels which are all below the theoretic threshold. Especially, since tissue material properties and instrument contact areas used to calculate the forces are not always constant.

As shown in Chaps. 9 and 11, sufficient functional arthroscopic simulators are available as well as metrics to define trainees performance and to monitor progression. The next step is to design and validate sets of training tasks and support there applicability with evidence-based thresholds. The data in this chapter provide the first values that can be used.