Introduction

With the advent of robotic technology, the many technical limitations inherent to laparoscopic surgery have now the potential to be circumvented [1, 2]. Robotic technologies have thus been widely applied in surgery, playing a significant role in robot-assisted surgery, teleoperation [1, 3], and robotic surgical training [4]. However, it is challenging for the robot to learn the complex manipulation of surgical instruments during surgery.

In past decades, researchers have devoted their efforts in enabling robots to perform motions or manipulations similar to that of a human. The learning of human gestures by imitation for a humanoid robot has attracted intensive research efforts [5, 6]. Various methods have been explored, such as the Hidden Markov Models (HMM) [7], the Gaussian Mixture Modules (GMM) [8], and the node transition graphs method [9]. The technologies of learning by demonstration have also been explored in motion planning for surgical tasks [10]. Robust representation of surgical instrument motion will benefit the development of autonomous surgical robots [11], as well as evaluation of surgical skills [12, 13].

Demonstration-based learning techniques [6] are methods that equip a robot with motion learning capability. This is achieved through characterizing the demonstrations and then reconstructing an optimal motion trajectory for a robot. Demonstration-based learning has been studied using neural network [1417] and statistical representation [8, 18]. In recent research, learning by demonstration technology has also been applied in robot-assisted surgery to recognize, learn, and evaluate the motion trajectory of surgical instruments in surgery. Mayer et al. [14] applied a recurrent neural network to learn the tying of surgical knots based on the trajectory of human manipulation of the surgical instruments. Reiley et al. [19] applied a statistical modeling method of learning and categorizing motion in surgery. Lin et al. [13] applied Linear discrimination analysis and Bayes classifier methods in motion modeling for the purpose of skill evaluation in robot-assisted surgery.

In order to model the motion acquired from demonstrations, it is essential to identify the motion primitives. In a HMM- or GMM-based approach, a left to right model or single-chain cyclic model with a predefined number of motion primitives was assumed, and arbitrary numbers of interconnected motion primitives were not considered [9]. Different methods have been proposed to identify the number of mixture components such as the cross validation, the Akaike information criteria and the Bayesian information criteria. The cross validation method requires independent trials of demonstration to form a complete test set. Calinon et al. [8] proposed a BIC score method to determine the optimal number of motion primitives. The BIC score method is a tradeoff between a log-likelihood and the number of parameters to model the motion. The number of mixture components is decided by the BIC function [8] which produces the lowest score. However, the BIC score method requires multiple trials of the modeling process to determine the log-likelihood of the mode and the number of free parameters for the mixture model. The adaptive mean shift method is applicable to cluster the motion trajectory and to preserve the dextrous features in a motion trajectory. Nevertheless, the efficiency of the adaptive mean shift method relies on the choice of the bandwidth. To determine the optimal bandwidth is challenging. Various methods have been explored to identify the appropriate bandwidth for the given data set, such as the plug-in-rule method [20], the least square cross validation and contrast methods [21], and the Asymptotic Mean Integrated Squared Error method (AMISE) [22].

The objective of this paper is to model surgical skills based on motion trajectories of laparoscopic instrument with minimal user intervention. A key technical contribution is the proposed motion learning method that uses adaptive mean shift method to identify the motion primitives. The rest of the paper is organized as follows: The “Methods” section introduces the proposed method for surgical motion trajectory learning and illustrates the various techniques in each module of the proposed method. Section “Experiments and results” describes the application of this method with the experimental results of a tissue division task and a clip deployment task in a robotic surgical training system. In “Discussion” section, the generic motion model which is trained using primitives determined by adaptive mean shift method is compared with that of \(k\)-means method and fixed bandwidth mean shift method. Finally, this work is concluded in the section “Conclusion”.

Methods

Figure 1 describes the proposed method for the clustering, modeling, and reconstruction of the motion trajectory in the robotic learning of laparoscopic surgery. Suppose that the motion trajectory is

$$\begin{aligned} \mathbf{X}=\{X_{t,i} ,\mathbf{X}_{s,i} \},\quad \hbox {for } i=1\cdots N \end{aligned}$$
(1)

where \(X_t\) and \(\mathbf{X}_s\) are the time and spatial components of the trajectory, respectively, \(N\) is the number of the observations. In order to eliminate the effect of the non-homogeneity of motion speed among the trials of the same task, Dynamic Time Warping (DTW) is performed to align the trajectories according to its features. The Principal Component Analysis (PCA) is used to reduce the dimensionality of the high-dimensional data and to preserve their features. The aligned motion data sets are therefore transformed into latent space by the PCA. The motion data sets in latent space are then clustered by the adaptive mean shift method with optimal bandwidth to identify the motion primitives. The number of motion primitives is defined as the number of mixture components for statistical modeling. Gaussian Mixture Model (GMM) is trained with the clustered motion data to estimate its parameters. When the GMM model is trained, the estimated parameters in latent space are then projected back to their original space. With the motion data represented by GMM, Gaussian Mixture Regression (GMR) can be applied to retrieve smooth trajectory in original space with given temporal information.

Fig. 1
figure 1

Data processing procedure to model and reconstruct the motion trajectories

Data processing

Motion speed in the execution of a given task varies from one trial to another. Therefore, the features in the motion trajectories do not appear in the same region across the trials. Hence, DTW is required to align the features from different trails in the same time span. The DTW measures the similarity between two trajectories which may vary in temporal information. It eliminates the constraint of distortions in time, between separate trajectories, which reduce the capability of the statistical models. To avoid misalignment during DTW, the trajectory data of each trial are divided into several subtasks with landmarks, such as approaching tissue, holding tissue, division of tissue. Each subtask is temporally aligned by the DTW. The trajectory candidate with the longest time span is chosen as the reference trajectory during the DTW. The results of the DTW of each subtask are joined together accordingly and expressed as \(\mathbf{T}=\{T_{t,i} ,\mathbf{T}_{s,i} \}\).

PCA is required to reduce the dimensionality for high-dimensional data, reduce noise, and identify the principal axis of the temporal aligned trajectory data. With PCA, \(\{\mathbf{T}_{s,i} \}\) is expressed in latent space. The spatial component in the latent space is written as

$$\begin{aligned} \{\mathbf{x}_{s,i} \}=\mathbf{A}\cdot \{\mathbf{T}_{s,i} \},\quad i=1\cdots N, \end{aligned}$$
(2)

where \(\mathbf{A}=\{\upsilon _{1,D} ,\upsilon _{2,D} ,\cdots \upsilon _{i,D} \}\) is a transformation matrix, and \(\upsilon _i\) is the eigenvectors of the covariance matrix of the centered motion data set \(\{\mathbf{T}_{s,i} \}\) [8], and subscript \(D\) is the minimum number of the dimensionality required in the latent space. Hence, the motion trajectory data after PCA can be expressed as

$$\begin{aligned} \mathbf{x}=\{x_{t,i} ,\mathbf{x}_{s,i} \},\quad i=1\cdots N, \end{aligned}$$
(3)

where \(\mathbf{x}_{s,i}\) is the spatial component expressed in the latent space.

Adaptive mean shift clustering of motion trajectory

Mixture model is a mixture distribution that represents the probability distribution of the observations in the overall population. The number of mixture components \(K_p\) and the number of observations are two basic parameters for any mixture model. In this study, the number of motion primitives in a task is the number of mixture components used in modeling the task. Identification of motion primitives is required for application of mixture model in modeling motion trajectories. However, the number of motion primitives is not known for a demonstration of real tasks. The adaptive mean shift method can be applied to cluster the motion trajectories and to identify the number of components based on the bandwidth of the data set.

The mean shift method first defines a window around each data point and computes the mean of the data points, after which the center of the window is shifted to the mean according to the mean shift vector (7) and the algorithm is repeated until the mean shift vector is less than a specified threshold value. The data points in the feature space are considered as a probability density function. Kernel function is applied to estimate the density. The kernel density estimation is a nonparametric way to estimate the density function of a random variable. The kernel \(k(\mathbf{x})\) is a positive definite bounded function satisfying \(\int {K(\mathbf{x})\hbox {d}\mathbf{x}=1}\) and \(\int {\mathbf{x}K(\mathbf{x})\hbox {d}\mathbf{x}=0}\) [23]. Given a kernel \(K(\mathbf{x})=k(||\mathbf{x}||^{2})\) with bandwidth parameter \(h\), the kernel density estimator for a given set of D-dimensional data expressed as

$$\begin{aligned} \widehat{f}(x)=\frac{1}{Nh^{D}}\sum \limits _{i=1}^N {K\left( \left\| \frac{\mathbf{x-x}_i}{h}\right\| ^{2}\right) }. \end{aligned}$$
(4)

There are several variants of exact kernel function [23]. Research [21] had shown that the profile of the kernel is not crucial to the kernel density estimation. The quality of the kernel estimation depends on the value of the bandwidth \(h\) instead of the profile of the kernel. Although the kernel density estimation has been commonly applied in data analysis, the determination of the optimal choice of the bandwidth for the kernel is still an active research topic [2022].

We applied the adaptive bandwidth introduced by Comaniciu et al. [24]. The adaptive bandwidth, a non-random sequence of positive numbers, is expressed as

$$\begin{aligned} h(\mathbf{x}_i )=h_o \left[ {\frac{\lambda }{\widehat{f}(\mathbf{x}_i )}} \right] ^{\frac{1}{2}},\quad i=1\cdots N, \end{aligned}$$
(5)

where \(\lambda \) is the proportionality constant and defined as \(\log \lambda =N^{-1}\sum _{i=1}^N {\log \widehat{f}(\mathbf{x}_i )}\), and \(h_o\) is the initial bandwidth. The plug-in-rule methods [25] were applied to determine an appropriate initial bandwidth in this study.

With Eqs. (4) and (5), the density estimation function for the adaptive bandwidth is written as

$$\begin{aligned} \widehat{f}(\mathbf{x})=\frac{1}{Nh(\mathbf{x}_i )^{D}}\sum \limits _{i=1}^N {K\left( \left\| \frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}\right\| ^{2}\right) }. \end{aligned}$$
(6)

Hence, the mean shift vector is expressed as

$$\begin{aligned} M_v (\mathbf{x})=\frac{\sum _{i=1}^N {\frac{\mathbf{x}_i }{h^{D+2}}g(||\frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}||^{2})} }{\sum _{i=1}^N {\frac{1}{h^{D+2}}g(||\frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}||^{2})} }-\mathbf{x}. \end{aligned}$$
(7)

where \(g(\mathbf{x})=-K^{{\prime }}(\mathbf{x})\). The details of the derivation of the Eq. (7) is available in [24].

Statistical modeling and parameter estimation

Gaussian mixture model

Gaussian mixture model is a linear superposition of \(K_p\) Gaussian components, defined by probability density function

$$\begin{aligned} p(\mathbf{x}_i )=\sum \limits _{k_p =1}^{K_p } {p(k_p )p(\mathbf{x}_i |k_p )} ,\quad i=1\cdots N, \end{aligned}$$
(8)

where \(p(k_p )\!=\!\pi _{k_p }\) is the prior, and \(p(\mathbf{x}_i |k_{_p } )\!=\!\mathsf{N}(x_i ;u_{k_p },\Sigma _{k_p } ) =\frac{1}{\sqrt{(2\pi )^{D}|\Sigma _{k_p } |}}e^{-\frac{1}{2}[(\mathbf{x}_i -\mu _{_{k_p } } )^{T}\Sigma _{k_p =1}^{-1} (\mathbf{x}_i -u_{k_p } )]}\) is the conditional probability density functions for component \(k_p\), and \(p(\mathbf{x}_i )\) is a probability that the data point \(\mathbf{x}_i\) constructed by the model.

The parameters of the Gaussian Mixture Model are expressed as: \(\{\pi _{k_p } ,\mu _{k_p } ,\Sigma _{k_p } \}_{k_p =1}^{K_p }\), where \(\pi _{k_p }\) is the prior probability \(\mu _{k_p }\) is the mean vector, and \(\Sigma _{k_p }\) is the covariance matrix. The cumulated posterior probability of the Gaussian mixture model is expressed as \(E_{k_p} =\sum _{i=1}^N {p(k_p |\mathbf{x}_i )}\). The number of the components \(K_p\) is obtained by the adaptive mean shift clustering method described above. The trajectory data \(\mathbf{x}_i\) in our study contains the temporal and the spatial information, as shown in Eq. (1), and hence, the mean vector is expressed as \(\mu _{k_p } =\{\mu _{t,k_p } ,\mu _{s,k_p } \}\), and the covariance matrix can be expressed as \(\Sigma _{k_p } =\left( {{\begin{array}{ll} {\Sigma _{tt,k_p }}&{} {\Sigma _{ts,k_p } }\\ {\Sigma _{st,k_p }}&{} {\Sigma _{ss,k_p } }\\ \end{array} }} \right) \).

The GMM parameters \(\{\pi _{k_p } ,\mu _{k_p } ,\Sigma _{k_p }\}\) are estimated by Expectation Maximization algorithm (EM) [26] with the demonstration trajectory data in Eq. (3). As the estimated parameters are for the data in the latent space, they are projected back into the original space by

$$\begin{aligned} \begin{aligned} \mu _{k_p }&=\mathbf{A}\cdot \mu _{k_p } ^{\prime \prime }\\ \Sigma _{k_p }&=\mathbf{B}\cdot \Sigma _{k_p } ^{\prime \prime }\cdot \mathbf{B}^{\prime },\quad k_p =1\cdots K_p\\ \pi _{k_p }&=\pi _{k_p } ^{\prime \prime } \end{aligned} \end{aligned}$$
(9)

where \(\pi _{k_p } ^{\prime \prime },\,\mu _{k_p }^{\prime \prime }\) and \(\Sigma _{k_p }^{\prime \prime }\) are the prior probability, mean vector, covariance matrix of motion data set in the latent space, and \(\mathbf{B}=\left[ {{\begin{array}{ll} \mathbf{1}&{} \mathbf{0}\\ \mathbf{0}&{} \mathbf{A} \end{array} }} \right] ,\,\mathbf{A}\) is a transformation matrix described in Eq. (2).

Gaussian mixture regression

Gaussian mixture regression is applied to reconstruct a trajectory represented by the Gaussian Mixture Model. The regression method estimates the conditional expectation of \(\widehat{\mathbf{X}}_s\) with given \(X_t\), and hence, the entire trajectory can be reconstructed with its characteristics encoded by the Gaussian mixture models. For the \(k_p ^\mathrm{th}\) component at given time \(X_t \), the expected distribution of \(\mathbf{X}_{s,k_p }\) is

$$\begin{aligned} p(\mathbf{X}_{s,k_p } |X_{t,k_{_p } } )= \mathsf{N}(\mathbf{X}_{s,k_p } ;\widehat{\mathbf{X}}_{s,k_p } ,\widehat{\Sigma }_{ss,k_p } ), \end{aligned}$$
(10)

where \(\widehat{\mathbf{X}}_{s,k_p }\) and \(\widehat{\Sigma }_{ss,k_p } \) is the conditional expected value and expected covariance of the mixture component \(k_p\), respectively. They are expressed as

$$\begin{aligned} \widehat{\mathbf{X}}_{s,k_p }&= \mu _{s,k_p } +\Sigma _{st,k_p } (\Sigma _{tt,k_p } )^{-1}(X_t -\mu _{t,k_p } )),\nonumber \\ \widehat{\Sigma }_{s,k_p }&= \Sigma _{s,k_p } +\Sigma _{st,k_p } (\Sigma _{tt,k_p } )^{-1}\Sigma _{ts,k_p } ), \end{aligned}$$
(11)

\(\widehat{\mathbf{X}}_{s,k_p }\) and \(\widehat{\Sigma }_{ss,k_p }\) are combined based on the probability that the component \(k_p\) for the given time \(X_t\), which is expressed as

$$\begin{aligned} p(\mathbf{X}_s |X_t )=\sum \limits _{k_p =1}^{K_p } {\beta _{k_p } } \mathsf{N}(\mathbf{X}_{s,k_p } ;\widehat{\mathbf{X}}_{s,k_p } ,\widehat{\Sigma }_{ss,k_p}), \end{aligned}$$
(12)

where \(\beta _{k_p } =\frac{p(k_p )p(X_t |k_p )}{\sum _{kp=1}^{Kp} {p(i)p(X_t |i)} }=\frac{\pi _{k_p } \mathsf{N}(X_t ;\mu _{t,kp} ,\Sigma _{tt,k_p } )}{\sum _{k_p =1}^{Kp} {\pi _i \mathsf{N}(X_t ;\mu _{t,i} ,\Sigma _{tt,i} )}}\).

An estimation of the conditional expectation of \(\mathbf{X}_s\) at the given time \(X_t\) for component \(k_p ^\mathrm{th}\) in the mixture model is

$$\begin{aligned} \widehat{\mathbf{X}}_s =\sum \limits _{k_p =1}^{K_p } {\beta _{k_p } } \widehat{\mathbf{X}}_{s,k_p } , \widehat{\Sigma }_{ss} =\sum \limits _{k_p =1}^{K_p } {\beta _{k_p }^2 } \widehat{\Sigma }_{ss,k_p } , \end{aligned}$$
(13)

The generalized form of the motion trajectory in its original space can be expressed as \(\widehat{\mathbf{X}}=\{X_t ,\widehat{\mathbf{X}}_s\}\).

Experiments and results

We conducted experiments to evaluate our proposed method. Three subjects (\(30\pm 3\) years old) participated in the experiments. Subject 1 performed a tissue division task, while Subjects 2 and 3 performed a clip deployment task. Tissue division and clip deployment are common tasks in surgical procedure, and they are commonly found in laparoscopic cholecystectomy, sectionectomy of liver, and colectomy. Based on our experience, more than 20 repeats of a demonstration will be sufficient for the purposes of modeling and analyses. Hence, we collected 22 trajectories from Subject 1 and 2 each, and 24 trajectories from Subject 3. The motion trajectories collected from Subject 1’s demonstration were used to show the feasibility of the method described in “Methods” section in details. In this section, we introduce the surgical simulation system [27] for the experiment and then explain the experimental method and modeling results.

Surgical simulation system for the experiments

The surgical simulation system was designed for image-guided robotic-assisted surgical (IRAS) training. It consists of two modules, the robotic laparoscopic surgical trainer and the surgical simulation platform, as shown in Fig. 2. The system allows a user to conduct a virtual laparoscopic procedure by operating on a virtual patient through the robotic laparoscopic trainer. The virtual laparoscopic procedure can be acquired and reproduced for training and analysis purposes.

Fig. 2
figure 2

Overview of IRAS surgical training system: a robotic laparoscopic surgical trainer and b virtual surgical simulation platform

The robotic laparoscopic surgical trainer serves as a human–machine interface in both processes of acquiring surgical procedure and providing guidance to the users in a training process. The robot was designed with 10 degrees-of-freedom. It is capable of mimicking the motion kinematics of the laparoscopic instruments in the real surgery. Users can operate with the robotic handles (Fig. 2a) and using them to perform a virtual surgery. The motion information of the robotic handles is sent to the surgical simulation platform to drive the virtual instruments and operate on the virtual patient.

The surgical simulation platform comprises of virtual patients, a tool library of laparoscopic instruments and physics simulation engine. Tool–tissue interactions, organ deformation, tissue division, deployment of clips and other activities executed during surgery are simulated in the surgical simulation platform. The surgical simulation platform incorporates smoking, bleeding, perfusion, and audio effects for the operations involving hook electrodes and scissors. The simulated surgical procedure, including the motion of the robotic handle and the tool-tissue interaction, can be recorded and reproduced on the robotic trainer and surgical simulation platform simultaneously for training purposes. Further details of the surgical simulation system can be found in [27].

Experiment and analysis

The surgical simulation system is built with a laparoscopic cholecystectomy procedure. A tissue division procedure and a clip deployment procedure within the cholecystectomy surgery were modeled in the experiment. In the tissue division procedure, as shown in Fig. 3, a left-hand laparoscopic grasper was used to stretch and hold a cystic duct, while the right-hand laparoscopic scissors was used to divide the cystic duct. The motion trajectory data for this tissue division task were subsequently used in the modeling process. The motion trajectory of each trial was recorded in \(\mathbf{X}=\{X_{t,i} ,\mathbf{X}_{s,i} \}\) format, where the spacial data \(\mathbf{X}_{s,i}\) consisted of \(\{\mathbf{X}_p ,\mathbf{X}_y ,\mathbf{X}_t ,\mathbf{X}_h ,\mathbf{X}_r \}\) from 5 axes, i.e., pitch, yaw, translation, handle, and roll, respectively. The trajectories of motion surgical instruments were sampled at 8.3 Hz.

Fig. 3
figure 3

A tissue division simulation on a virtual patient in the surgical simulation system

Figure 4a, c depicts the trajectories of the tissue division task for the left hand and the right hand instruments, respectively. Time taken to complete each trial of the same task was different. Features from different trials appeared to overlap each other, such as shown in the plot of handle’s angle in Fig. 4c. This reduced the capability of GMM/GMR to extract the key feature of the motion. Figure 4b, d is the motion data of the tissue division task after the multi-dimensional Dynamic Time Warping, with the motion features aligned.

Fig. 4
figure 4

Comparison of the raw motion data of the tissue division task collected from the simulator and the motion data after multi-dimensional Dynamic Time Warping. a and c are raw motion data for left and right instrument, respectively, b and d are the motion data after DTW for left and right instrument, respectively. The circled sections indicated the overlapped features in the raw motion data and the results after DTW

To obtain the principal axis of the motion data, the PCA described in “Data Processing” section was applied, maintaining 95 percent of the variance for the motion trajectories. The initial bandwidth \(h_o\), which was obtained using plug-in-rule method based on the distance in the latent space data \(\{\mathbf{x}_{s,i} \}\), is 10.97 and 9.95 for left and right instruments, respectively. The adaptive bandwidth was determined using Eqs. (5) and (6). Figure 5 shows the adaptive bandwidth for one of the trials. The spatial data in the latent space \(\{\mathbf{x}_{s,i} \}\) were then grouped in clusters using the adaptive mean shift method described in “Adaptive mean shift clustering of motion trajectory” section.

Fig. 5
figure 5

The adaptive bandwidth value for the left and right trajectories of the instrument in one demonstration

Fig. 6
figure 6

The GMM modeling and the GMR regression results based on the proposed method. a and c are the GMM encoding for the tissue division task of the left and right instruments, respectively, based on the adaptive mean shift clustering results. The spot is the mean of each Gaussian component, and the patch is the square root of covariance matrix of the corresponding Gaussian component. b and d are the GMR regression results, the solid line is the expected mean of each Gaussian model at the given time \(t\), and the patch is the expected square root of the covariance matrix at the given time \(t\)

The GMM method described in “Gaussian mixture model” section was applied to model the spatial data \(\{\mathbf{x}_{s,i} \}\), and the parameters of the GMM model were estimated by the Expectation Maximization algorithm [26]. Figure 6a, c shows the Gaussian Mixture Models trained with the motion primitives identified by the adaptive mean shift method. Eight and thirteen primitives were identified in the left and right instruments trajectories. The estimated parameters were for the data set in the latent space. For GMR regression process, they were projected back into the original space by Eq. (9).

The GMR method described in “Gaussian mixture regression” section was then applied to reconstruct the trajectories in the original space. Figure 6b, d shows the GMR regression results of the GMM models which were trained to encode the surgical skills demonstrated. Figure 7 shows the 3D plot of the tissue division task with the demonstration trajectories and the reconstructed trajectories. The implementation of the GMM and the GMR is based on a Gaussian mixture tool kit [8] available in the public domain.

Fig. 7
figure 7

Raw motion trajectories and mean reconstructed model of Subject 1: a 22 motion trajectories (positional only) of the surgical tool tip in the tissue division task, b reconstructed mean trajectory by GMM and GMR. The orientation of instruments and open angle of the handles are not reflected in this plot. The plot in red represents the positional information of the left instrument, and the plot in blue represents that of the right instrument. The arrows indicate the direction of motion

In order to further evaluate the robustness of the proposed method, the method was applied to model a surgical task of deploying a clip with laparoscopic instruments in laparoscopic cholecystectomy using the system described in “Surgical simulation system for the experiments” section. In the experiment, the left instrument was used to grab and hold the gallbladder, while the right instrument approached the cystic duct and deployed a clip. The surgical task was carried out by Subjects 2 and 3 with the same virtual patient setup. Each subject repeated the task a number of times. Twenty-two and 24 trajectories were recorded from Subjects 2 and 3, respectively. Figures 8 and 9 show the raw motion trajectory data and the mean reconstructed model of Subjects 2 and 3, respectively. Comparing the mean reconstructed model of each subject’s left instrument, we can notice that each subject manipulated the instruments differently; Subject 2 tends to focus on controlling the span of instrument swing more closely than that of Subject 3.

Fig. 8
figure 8

Raw motion trajectories and mean reconstructed model of Subject 2. a 22 motion trajectories (positional only) of the surgical tool tip in the clip deployment task. b Reconstructed mean trajectory by GMM and GMR. The orientation of instruments and open angle of the handles are not reflected in this plot. The plot in red represents the positional information of the left instrument, and the plot in blue represents that of the right instrument. The arrows indicate the direction of motion

Fig. 9
figure 9

Raw motion trajectories and mean reconstructed model of Subject 3: a 24 motion trajectories (positional only) of the surgical tool tip in the clip deployment task, b reconstructed mean trajectory by GMM and GMR. The orientation of instruments and open angle of the handles are not reflected in this plot. The plot in red represents the positional information of the left instrument, and the plot in blue represents that of the right instrument. The arrows indicate the direction of motion

Discussion

We have applied the adaptive mean shift method to identify the motion primitives. The adaptive mean shift method provides an intuitive way in determining the number of motion primitives based on the initial bandwidth, which was obtained by the plug-in-rule method [20]. However, the performance of the adaptive mean shift method relies on its initial bandwidth and the adaptive bandwidth function [24].

Root Mean Square (RMS) error was applied to evaluate the quality of the motion model through the reconstructed motion model [28]. RMS error of the reconstructed trajectory with respect to the demonstrated trajectory after DTW was calculated as follows:

$$\begin{aligned} \hbox {RMS}=\sqrt{\frac{1}{MN}\sum \limits _{j=1}^M {\sum \limits _{i=1}^N {(\widehat{\mathbf{X}}_{s,i} -\mathbf{X}_{s,i})}}^{2}}, \end{aligned}$$
(14)

where \(M\) is the number of trials, \(N\) is the number of observations in each trial. \(\widehat{\mathbf{X}}_s\) and \(\mathbf{X}_s\) are the expected spacial components and the spacial components from the demonstrations, respectively.

We compared the quality of the motion model obtained based on different methods in identifying the motion primitives, i.e., adaptive bandwidth mean shift method, fixed bandwidth mean shift method and k-means method. The number of primitives required for k-means method was determined from the tests using adaptive mean shift method. Figure 10a–d shows the GMM modeling results based on k-means method and fixed bandwidth mean shift clustering method. The comparison with Fig. 6 reveals that our method has captured motion primitives with more focused Gaussian components than that of k-means and fixed bandwidth-based methods. The fixed bandwidth mean shift identified 6 and 14 primitives from the motion trajectory of left and right instruments, respectively. Although the three methods employed similar number of motion primitives, we observed from Table 1 that the Gaussian Mixture Models with adaptive mean shift method produced smaller RMS error comparing with that of the K-means method and fixed bandwidth methods.

Fig. 10
figure 10

GMM modeling based on \(k\)-means method and fixed bandwidth mean shift method. a, b are the GMM modeling results with \(k\)-means clustering method for the left and right instruments trajectories, respectively. c, d are the GMM modeling results with fixed bandwidth clustering method for the left and right instruments trajectories, respectively

Table 1 The RMS error (rotational joints) of the reconstructed trajectory to the demonstrations after DTW

The adaptive mean shift method also showed advantages in preserving the dexterous features in motion. For example, the handle motion (Fig. 4d) showed several open and close actions. These features have been encoded and reconstructed by the GMM/GMR with adaptive mean shift method, as shown in Fig. 6c, d. However, these features were not captured in GMM with k-means and fixed bandwidth methods (Fig. 10b, d), even the number of primitives used for k-means method was same as the number of primitives obtained by the adaptive mean shift method, and the fixed bandwidth method obtained similar number of primitives with the adaptive mean shift method.

Another advantage of the Gaussian mixture modeling method based on the adaptive mean shift method is that it does not need to specify the number of the Gaussian components. While it is possible to have a better fit of the trajectory with a high number of Gaussian components, this will be at the expense of poor generalization capability and potential risks of over fitting.

PCA is necessary in the analysis of the motion trajectory data. The PCA can be applied for reducing the dimensionality and the noise and also to rotate the data to the axis that allows the clustering algorithm to identify the motion primitives effectively. When dimensionality reduction is not required for the data set, PCA is necessary to rotate the data set according to the eigenvector of covariance matrix of the data set and to align the data in its principal axis. The tissue division trajectories were applied with the adaptive mean shift method directly without PCA and the identified motion primitives were used to train the GMM models. Figure 11 shows trained GMM modeling and GMR regression results. The data across large time spans were grouped in the same motion primitive which significantly reduced the capability of the Gaussian Mixture Regression. Table 2 shows that the RMS error of the reconstructed trajectory from the demonstrations after DTW is larger than that of with the PCA analysis. Therefore, the PCA is an important component of the solution.

Fig. 11
figure 11

The GMM modeling results of the tissue division trajectory without the PCA analysis. The data across large time span were grouped in same motion primitive. a, b are the GMM modeling of trajectories for left and right instruments respectively

Table 2 Effect of PCA on RMS error (rotational joints) of the reconstructed trajectory to the demonstrations after DTW

Our approach is suitable for modeling of surgical skills with a specific sequence of motion primitives, such as the division and clipping tasks modeled in this study. Both tasks require grabbing and holding onto the object first, before performing the task at certain locations. While performing the task, the pattern of opening and closing of instrument handle is consistent among the user’s executions. Clear motion sequences can be identified from the user’s demonstration. Surgical suturing could be modeled by the proposed method, as it required both hands to conduct the motion in sequences. Surgical operations in which the sequence of motion is not critical may not be represented by Gaussian Mixture model effectively. The method focuses on extraction and reconstruction of a generic model from demonstrations conducted by the user. It does not include the collaboration between two instruments and tool–tissue interaction. In order to consider these factors, the velocity of each instrument and the deformation of organ or tissue have to be modeled. Although the robustness of our method was evaluated with different surgical tasks, the study was limited by the size of sample, the complexity, and range of surgical procedures and devices. More surgical procedures should be studied in the future to demonstrate the generalizability of the proposed method. The future studies could be conducted in collaboration with other teaching hospitals in Singapore.

Conclusion

Learning from experienced surgeons is an efficient way of transferring surgical skills from the senior surgeons to the surgical trainees. The method of learning by demonstration is an approach to model the surgical skills and facilitate it for surgical training from the perspective of motion trajectory. The trained motion model in learning by demonstration approach can serve as a generic model representing surgical skills. The motion model can then be used by the robots to provide guidance to the trainee. Experimentation of our robotic surgical training system and the underlying technology with trainee doctors is currently ongoing.

The method proposed in this study demonstrates the feasibility of modeling skills without specifying number of motion primitives. This has contributed to the robustness of our robotic surgical training system. Adaptive mean shift method has been applied to identify the motion primitives, and the Gaussian Mixture Models is trained by demonstrations to represent a surgical skill. However, collaboration from multiple instruments is essential in the execution of many surgical tasks. We are developing collaborative models to represent the cooperation of multiple surgical instruments which is beyond the scope of this paper. The various spatial and temporal constraints in surgery also have to be taken into consideration for a complete simulation of a surgical operation. For example, in situations where a certain location/obstruction has to be avoided, or a specific location that must be passing through in order to reach the targeted site, constraints dependent on individual patient anatomy have to be considered.