Robotic learning of motion using demonstrations and statistical models for surgical simulation

Yang, Tao; Chui, Chee Kong; Liu, Jiang; Huang, Weimin; Su, Yi; Chang, Stephen K. Y.

doi:10.1007/s11548-013-0967-7

Robotic learning of motion using demonstrations and statistical models for surgical simulation

Original Article
Published: 14 December 2013

Volume 9, pages 813–823, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Robotic learning of motion using demonstrations and statistical models for surgical simulation

Download PDF

Tao Yang^1,2,
Chee Kong Chui²,
Jiang Liu³,
Weimin Huang¹,
Yi Su⁴ &
…
Stephen K. Y. Chang⁵

502 Accesses
6 Citations
Explore all metrics

Abstract

Purpose

In robotic-assisted surgical training, the expertise of surgeons in maneuvering surgical instruments may be utilized to provide the motion trajectories for teaching. However, the motion primitives for trajectory planning are not known until the motion trajectory is generalized. We hypothesize that a generic model that encodes surgical skills using demonstrations and statistical models can be used by the surgical training robot to determine the motion primitive base on the motion trajectory.

Methods

The generic model was developed from twenty-two sets of motion trajectories of soft tissue division with laparoscopic scissors collected from a robotic laparoscopic surgical training system. Adaptive mean shift method with initial bandwidth determined by the plug-in-rule method was used to identify the primitives in the motion trajectories. Gaussian Mixture Model was applied to model the underlying motion structure. Gaussian Mixture Regression was then applied to reconstruct a generic motion trajectory for the task.

Results

The generic model and proposed method were investigated in experiments. Motion trajectory of tissue division was model and reconstructed. The motion model which was trained based on primitives determined by adaptive mean shift method produced RMS error of $3.05^{\circ }$ and $3.08^{\circ }$ with respect to the demonstrated trajectories of left and right instruments, respectively. The RMS error was smaller than that of k-means method and fixed bandwidth mean shift method. The dexterous features in the demonstrations were also preserved.

Conclusions

Surgical tasks can be modeled using Gaussian Mixture Model and motion primitives identified by adaptive mean shift method with minimum user intervention. Generic motion trajectory has been successfully reconstructed based on the motion model. Investigation on the effectiveness of this method and generic model for surgical training is ongoing.

Transition State Clustering: Unsupervised Surgical Trajectory Segmentation for Robot Learning

Automated Segmentation of Surgical Motion for Performance Analysis and Feedback

Unsupervised Learning for Surgical Motion by Learning to Predict the Future

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the advent of robotic technology, the many technical limitations inherent to laparoscopic surgery have now the potential to be circumvented [1, 2]. Robotic technologies have thus been widely applied in surgery, playing a significant role in robot-assisted surgery, teleoperation [1, 3], and robotic surgical training [4]. However, it is challenging for the robot to learn the complex manipulation of surgical instruments during surgery.

In past decades, researchers have devoted their efforts in enabling robots to perform motions or manipulations similar to that of a human. The learning of human gestures by imitation for a humanoid robot has attracted intensive research efforts [5, 6]. Various methods have been explored, such as the Hidden Markov Models (HMM) [7], the Gaussian Mixture Modules (GMM) [8], and the node transition graphs method [9]. The technologies of learning by demonstration have also been explored in motion planning for surgical tasks [10]. Robust representation of surgical instrument motion will benefit the development of autonomous surgical robots [11], as well as evaluation of surgical skills [12, 13].

Demonstration-based learning techniques [6] are methods that equip a robot with motion learning capability. This is achieved through characterizing the demonstrations and then reconstructing an optimal motion trajectory for a robot. Demonstration-based learning has been studied using neural network [14–17] and statistical representation [8, 18]. In recent research, learning by demonstration technology has also been applied in robot-assisted surgery to recognize, learn, and evaluate the motion trajectory of surgical instruments in surgery. Mayer et al. [14] applied a recurrent neural network to learn the tying of surgical knots based on the trajectory of human manipulation of the surgical instruments. Reiley et al. [19] applied a statistical modeling method of learning and categorizing motion in surgery. Lin et al. [13] applied Linear discrimination analysis and Bayes classifier methods in motion modeling for the purpose of skill evaluation in robot-assisted surgery.

In order to model the motion acquired from demonstrations, it is essential to identify the motion primitives. In a HMM- or GMM-based approach, a left to right model or single-chain cyclic model with a predefined number of motion primitives was assumed, and arbitrary numbers of interconnected motion primitives were not considered [9]. Different methods have been proposed to identify the number of mixture components such as the cross validation, the Akaike information criteria and the Bayesian information criteria. The cross validation method requires independent trials of demonstration to form a complete test set. Calinon et al. [8] proposed a BIC score method to determine the optimal number of motion primitives. The BIC score method is a tradeoff between a log-likelihood and the number of parameters to model the motion. The number of mixture components is decided by the BIC function [8] which produces the lowest score. However, the BIC score method requires multiple trials of the modeling process to determine the log-likelihood of the mode and the number of free parameters for the mixture model. The adaptive mean shift method is applicable to cluster the motion trajectory and to preserve the dextrous features in a motion trajectory. Nevertheless, the efficiency of the adaptive mean shift method relies on the choice of the bandwidth. To determine the optimal bandwidth is challenging. Various methods have been explored to identify the appropriate bandwidth for the given data set, such as the plug-in-rule method [20], the least square cross validation and contrast methods [21], and the Asymptotic Mean Integrated Squared Error method (AMISE) [22].

The objective of this paper is to model surgical skills based on motion trajectories of laparoscopic instrument with minimal user intervention. A key technical contribution is the proposed motion learning method that uses adaptive mean shift method to identify the motion primitives. The rest of the paper is organized as follows: The “Methods” section introduces the proposed method for surgical motion trajectory learning and illustrates the various techniques in each module of the proposed method. Section “Experiments and results” describes the application of this method with the experimental results of a tissue division task and a clip deployment task in a robotic surgical training system. In “Discussion” section, the generic motion model which is trained using primitives determined by adaptive mean shift method is compared with that of $k$-means method and fixed bandwidth mean shift method. Finally, this work is concluded in the section “Conclusion”.

Methods

Figure 1 describes the proposed method for the clustering, modeling, and reconstruction of the motion trajectory in the robotic learning of laparoscopic surgery. Suppose that the motion trajectory is

$$\begin{aligned} \mathbf{X}=\{X_{t,i} ,\mathbf{X}_{s,i} \},\quad \hbox {for } i=1\cdots N \end{aligned}$$

(1)

where $X_t$ and $\mathbf{X}_s$ are the time and spatial components of the trajectory, respectively, $N$ is the number of the observations. In order to eliminate the effect of the non-homogeneity of motion speed among the trials of the same task, Dynamic Time Warping (DTW) is performed to align the trajectories according to its features. The Principal Component Analysis (PCA) is used to reduce the dimensionality of the high-dimensional data and to preserve their features. The aligned motion data sets are therefore transformed into latent space by the PCA. The motion data sets in latent space are then clustered by the adaptive mean shift method with optimal bandwidth to identify the motion primitives. The number of motion primitives is defined as the number of mixture components for statistical modeling. Gaussian Mixture Model (GMM) is trained with the clustered motion data to estimate its parameters. When the GMM model is trained, the estimated parameters in latent space are then projected back to their original space. With the motion data represented by GMM, Gaussian Mixture Regression (GMR) can be applied to retrieve smooth trajectory in original space with given temporal information.

Data processing

Motion speed in the execution of a given task varies from one trial to another. Therefore, the features in the motion trajectories do not appear in the same region across the trials. Hence, DTW is required to align the features from different trails in the same time span. The DTW measures the similarity between two trajectories which may vary in temporal information. It eliminates the constraint of distortions in time, between separate trajectories, which reduce the capability of the statistical models. To avoid misalignment during DTW, the trajectory data of each trial are divided into several subtasks with landmarks, such as approaching tissue, holding tissue, division of tissue. Each subtask is temporally aligned by the DTW. The trajectory candidate with the longest time span is chosen as the reference trajectory during the DTW. The results of the DTW of each subtask are joined together accordingly and expressed as $\mathbf{T}=\{T_{t,i} ,\mathbf{T}_{s,i} \}$.

PCA is required to reduce the dimensionality for high-dimensional data, reduce noise, and identify the principal axis of the temporal aligned trajectory data. With PCA, $\{\mathbf{T}_{s,i} \}$ is expressed in latent space. The spatial component in the latent space is written as

$$\begin{aligned} \{\mathbf{x}_{s,i} \}=\mathbf{A}\cdot \{\mathbf{T}_{s,i} \},\quad i=1\cdots N, \end{aligned}$$

(2)

where $\mathbf{A}=\{\upsilon _{1,D} ,\upsilon _{2,D} ,\cdots \upsilon _{i,D} \}$ is a transformation matrix, and $\upsilon _i$ is the eigenvectors of the covariance matrix of the centered motion data set $\{\mathbf{T}_{s,i} \}$ [8], and subscript $D$ is the minimum number of the dimensionality required in the latent space. Hence, the motion trajectory data after PCA can be expressed as

$$\begin{aligned} \mathbf{x}=\{x_{t,i} ,\mathbf{x}_{s,i} \},\quad i=1\cdots N, \end{aligned}$$

(3)

where $\mathbf{x}_{s,i}$ is the spatial component expressed in the latent space.

Adaptive mean shift clustering of motion trajectory

Mixture model is a mixture distribution that represents the probability distribution of the observations in the overall population. The number of mixture components $K_p$ and the number of observations are two basic parameters for any mixture model. In this study, the number of motion primitives in a task is the number of mixture components used in modeling the task. Identification of motion primitives is required for application of mixture model in modeling motion trajectories. However, the number of motion primitives is not known for a demonstration of real tasks. The adaptive mean shift method can be applied to cluster the motion trajectories and to identify the number of components based on the bandwidth of the data set.

The mean shift method first defines a window around each data point and computes the mean of the data points, after which the center of the window is shifted to the mean according to the mean shift vector (7) and the algorithm is repeated until the mean shift vector is less than a specified threshold value. The data points in the feature space are considered as a probability density function. Kernel function is applied to estimate the density. The kernel density estimation is a nonparametric way to estimate the density function of a random variable. The kernel $k(\mathbf{x})$ is a positive definite bounded function satisfying $\int {K(\mathbf{x})\hbox {d}\mathbf{x}=1}$ and $\int {\mathbf{x}K(\mathbf{x})\hbox {d}\mathbf{x}=0}$ [23]. Given a kernel $K(\mathbf{x})=k(||\mathbf{x}||^{2})$ with bandwidth parameter $h$, the kernel density estimator for a given set of D-dimensional data expressed as

$$\begin{aligned} \widehat{f}(x)=\frac{1}{Nh^{D}}\sum \limits _{i=1}^N {K\left( \left\| \frac{\mathbf{x-x}_i}{h}\right\| ^{2}\right) }. \end{aligned}$$

(4)

There are several variants of exact kernel function [23]. Research [21] had shown that the profile of the kernel is not crucial to the kernel density estimation. The quality of the kernel estimation depends on the value of the bandwidth $h$ instead of the profile of the kernel. Although the kernel density estimation has been commonly applied in data analysis, the determination of the optimal choice of the bandwidth for the kernel is still an active research topic [20–22].

We applied the adaptive bandwidth introduced by Comaniciu et al. [24]. The adaptive bandwidth, a non-random sequence of positive numbers, is expressed as

$$\begin{aligned} h(\mathbf{x}_i )=h_o \left[ {\frac{\lambda }{\widehat{f}(\mathbf{x}_i )}} \right] ^{\frac{1}{2}},\quad i=1\cdots N, \end{aligned}$$

(5)

where $\lambda $ is the proportionality constant and defined as $\log \lambda =N^{-1}\sum _{i=1}^N {\log \widehat{f}(\mathbf{x}_i )}$, and $h_o$ is the initial bandwidth. The plug-in-rule methods [25] were applied to determine an appropriate initial bandwidth in this study.

With Eqs. (4) and (5), the density estimation function for the adaptive bandwidth is written as

$$\begin{aligned} \widehat{f}(\mathbf{x})=\frac{1}{Nh(\mathbf{x}_i )^{D}}\sum \limits _{i=1}^N {K\left( \left\| \frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}\right\| ^{2}\right) }. \end{aligned}$$

(6)

Hence, the mean shift vector is expressed as

$$\begin{aligned} M_v (\mathbf{x})=\frac{\sum _{i=1}^N {\frac{\mathbf{x}_i }{h^{D+2}}g(||\frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}||^{2})} }{\sum _{i=1}^N {\frac{1}{h^{D+2}}g(||\frac{\mathbf{x-x}_i }{h(\mathbf{x}_i )}||^{2})} }-\mathbf{x}. \end{aligned}$$

(7)

where $g(\mathbf{x})=-K^{{\prime }}(\mathbf{x})$. The details of the derivation of the Eq. (7) is available in [24].

Statistical modeling and parameter estimation

Gaussian mixture model

Gaussian mixture model is a linear superposition of $K_p$ Gaussian components, defined by probability density function

$$\begin{aligned} p(\mathbf{x}_i )=\sum \limits _{k_p =1}^{K_p } {p(k_p )p(\mathbf{x}_i |k_p )} ,\quad i=1\cdots N, \end{aligned}$$

(8)

where $p(k_p )\!=\!\pi _{k_p }$ is the prior, and $p(\mathbf{x}_i |k_{_p } )\!=\!\mathsf{N}(x_i ;u_{k_p },\Sigma _{k_p } ) =\frac{1}{\sqrt{(2\pi )^{D}|\Sigma _{k_p } |}}e^{-\frac{1}{2}[(\mathbf{x}_i -\mu _{_{k_p } } )^{T}\Sigma _{k_p =1}^{-1} (\mathbf{x}_i -u_{k_p } )]}$ is the conditional probability density functions for component $k_p$, and $p(\mathbf{x}_i )$ is a probability that the data point $\mathbf{x}_i$ constructed by the model.

The parameters of the Gaussian Mixture Model are expressed as: $\{\pi _{k_p } ,\mu _{k_p } ,\Sigma _{k_p } \}_{k_p =1}^{K_p }$, where $\pi _{k_p }$ is the prior probability $\mu _{k_p }$ is the mean vector, and $\Sigma _{k_p }$ is the covariance matrix. The cumulated posterior probability of the Gaussian mixture model is expressed as $E_{k_p} =\sum _{i=1}^N {p(k_p |\mathbf{x}_i )}$. The number of the components $K_p$ is obtained by the adaptive mean shift clustering method described above. The trajectory data $\mathbf{x}_i$ in our study contains the temporal and the spatial information, as shown in Eq. (1), and hence, the mean vector is expressed as $\mu _{k_p } =\{\mu _{t,k_p } ,\mu _{s,k_p } \}$, and the covariance matrix can be expressed as $\Sigma _{k_p } =\left( {{\begin{array}{ll} {\Sigma _{tt,k_p }}&{} {\Sigma _{ts,k_p } }\\ {\Sigma _{st,k_p }}&{} {\Sigma _{ss,k_p } }\\ \end{array} }} \right) $.

The GMM parameters $\{\pi _{k_p } ,\mu _{k_p } ,\Sigma _{k_p }\}$ are estimated by Expectation Maximization algorithm (EM) [26] with the demonstration trajectory data in Eq. (3). As the estimated parameters are for the data in the latent space, they are projected back into the original space by

$$\begin{aligned} \begin{aligned} \mu _{k_p }&=\mathbf{A}\cdot \mu _{k_p } ^{\prime \prime }\\ \Sigma _{k_p }&=\mathbf{B}\cdot \Sigma _{k_p } ^{\prime \prime }\cdot \mathbf{B}^{\prime },\quad k_p =1\cdots K_p\\ \pi _{k_p }&=\pi _{k_p } ^{\prime \prime } \end{aligned} \end{aligned}$$

(9)

where $\pi _{k_p } ^{\prime \prime },\,\mu _{k_p }^{\prime \prime }$ and $\Sigma _{k_p }^{\prime \prime }$ are the prior probability, mean vector, covariance matrix of motion data set in the latent space, and $\mathbf{B}=\left[ {{\begin{array}{ll} \mathbf{1}&{} \mathbf{0}\\ \mathbf{0}&{} \mathbf{A} \end{array} }} \right] ,\,\mathbf{A}$ is a transformation matrix described in Eq. (2).

Gaussian mixture regression

Gaussian mixture regression is applied to reconstruct a trajectory represented by the Gaussian Mixture Model. The regression method estimates the conditional expectation of $\widehat{\mathbf{X}}_s$ with given $X_t$, and hence, the entire trajectory can be reconstructed with its characteristics encoded by the Gaussian mixture models. For the $k_p ^\mathrm{th}$ component at given time $X_t $, the expected distribution of $\mathbf{X}_{s,k_p }$ is

$$\begin{aligned} p(\mathbf{X}_{s,k_p } |X_{t,k_{_p } } )= \mathsf{N}(\mathbf{X}_{s,k_p } ;\widehat{\mathbf{X}}_{s,k_p } ,\widehat{\Sigma }_{ss,k_p } ), \end{aligned}$$

(10)

where $\widehat{\mathbf{X}}_{s,k_p }$ and $\widehat{\Sigma }_{ss,k_p } $ is the conditional expected value and expected covariance of the mixture component $k_p$, respectively. They are expressed as

$$\begin{aligned} \widehat{\mathbf{X}}_{s,k_p }&= \mu _{s,k_p } +\Sigma _{st,k_p } (\Sigma _{tt,k_p } )^{-1}(X_t -\mu _{t,k_p } )),\nonumber \\ \widehat{\Sigma }_{s,k_p }&= \Sigma _{s,k_p } +\Sigma _{st,k_p } (\Sigma _{tt,k_p } )^{-1}\Sigma _{ts,k_p } ), \end{aligned}$$

(11)

$\widehat{\mathbf{X}}_{s,k_p }$ and $\widehat{\Sigma }_{ss,k_p }$ are combined based on the probability that the component $k_p$ for the given time $X_t$, which is expressed as

$$\begin{aligned} p(\mathbf{X}_s |X_t )=\sum \limits _{k_p =1}^{K_p } {\beta _{k_p } } \mathsf{N}(\mathbf{X}_{s,k_p } ;\widehat{\mathbf{X}}_{s,k_p } ,\widehat{\Sigma }_{ss,k_p}), \end{aligned}$$

(12)

where $\beta _{k_p } =\frac{p(k_p )p(X_t |k_p )}{\sum _{kp=1}^{Kp} {p(i)p(X_t |i)} }=\frac{\pi _{k_p } \mathsf{N}(X_t ;\mu _{t,kp} ,\Sigma _{tt,k_p } )}{\sum _{k_p =1}^{Kp} {\pi _i \mathsf{N}(X_t ;\mu _{t,i} ,\Sigma _{tt,i} )}}$.

An estimation of the conditional expectation of $\mathbf{X}_s$ at the given time $X_t$ for component $k_p ^\mathrm{th}$ in the mixture model is

$$\begin{aligned} \widehat{\mathbf{X}}_s =\sum \limits _{k_p =1}^{K_p } {\beta _{k_p } } \widehat{\mathbf{X}}_{s,k_p } , \widehat{\Sigma }_{ss} =\sum \limits _{k_p =1}^{K_p } {\beta _{k_p }^2 } \widehat{\Sigma }_{ss,k_p } , \end{aligned}$$

(13)

The generalized form of the motion trajectory in its original space can be expressed as $\widehat{\mathbf{X}}=\{X_t ,\widehat{\mathbf{X}}_s\}$.

Experiments and results

We conducted experiments to evaluate our proposed method. Three subjects ($30\pm 3$ years old) participated in the experiments. Subject 1 performed a tissue division task, while Subjects 2 and 3 performed a clip deployment task. Tissue division and clip deployment are common tasks in surgical procedure, and they are commonly found in laparoscopic cholecystectomy, sectionectomy of liver, and colectomy. Based on our experience, more than 20 repeats of a demonstration will be sufficient for the purposes of modeling and analyses. Hence, we collected 22 trajectories from Subject 1 and 2 each, and 24 trajectories from Subject 3. The motion trajectories collected from Subject 1’s demonstration were used to show the feasibility of the method described in “Methods” section in details. In this section, we introduce the surgical simulation system [27] for the experiment and then explain the experimental method and modeling results.

Surgical simulation system for the experiments

The surgical simulation system was designed for image-guided robotic-assisted surgical (IRAS) training. It consists of two modules, the robotic laparoscopic surgical trainer and the surgical simulation platform, as shown in Fig. 2. The system allows a user to conduct a virtual laparoscopic procedure by operating on a virtual patient through the robotic laparoscopic trainer. The virtual laparoscopic procedure can be acquired and reproduced for training and analysis purposes.

The robotic laparoscopic surgical trainer serves as a human–machine interface in both processes of acquiring surgical procedure and providing guidance to the users in a training process. The robot was designed with 10 degrees-of-freedom. It is capable of mimicking the motion kinematics of the laparoscopic instruments in the real surgery. Users can operate with the robotic handles (Fig. 2a) and using them to perform a virtual surgery. The motion information of the robotic handles is sent to the surgical simulation platform to drive the virtual instruments and operate on the virtual patient.

The surgical simulation platform comprises of virtual patients, a tool library of laparoscopic instruments and physics simulation engine. Tool–tissue interactions, organ deformation, tissue division, deployment of clips and other activities executed during surgery are simulated in the surgical simulation platform. The surgical simulation platform incorporates smoking, bleeding, perfusion, and audio effects for the operations involving hook electrodes and scissors. The simulated surgical procedure, including the motion of the robotic handle and the tool-tissue interaction, can be recorded and reproduced on the robotic trainer and surgical simulation platform simultaneously for training purposes. Further details of the surgical simulation system can be found in [27].

Experiment and analysis

The surgical simulation system is built with a laparoscopic cholecystectomy procedure. A tissue division procedure and a clip deployment procedure within the cholecystectomy surgery were modeled in the experiment. In the tissue division procedure, as shown in Fig. 3, a left-hand laparoscopic grasper was used to stretch and hold a cystic duct, while the right-hand laparoscopic scissors was used to divide the cystic duct. The motion trajectory data for this tissue division task were subsequently used in the modeling process. The motion trajectory of each trial was recorded in $\mathbf{X}=\{X_{t,i} ,\mathbf{X}_{s,i} \}$ format, where the spacial data $\mathbf{X}_{s,i}$ consisted of $\{\mathbf{X}_p ,\mathbf{X}_y ,\mathbf{X}_t ,\mathbf{X}_h ,\mathbf{X}_r \}$ from 5 axes, i.e., pitch, yaw, translation, handle, and roll, respectively. The trajectories of motion surgical instruments were sampled at 8.3 Hz.

Figure 4a, c depicts the trajectories of the tissue division task for the left hand and the right hand instruments, respectively. Time taken to complete each trial of the same task was different. Features from different trials appeared to overlap each other, such as shown in the plot of handle’s angle in Fig. 4c. This reduced the capability of GMM/GMR to extract the key feature of the motion. Figure 4b, d is the motion data of the tissue division task after the multi-dimensional Dynamic Time Warping, with the motion features aligned.

To obtain the principal axis of the motion data, the PCA described in “Data Processing” section was applied, maintaining 95 percent of the variance for the motion trajectories. The initial bandwidth $h_o$, which was obtained using plug-in-rule method based on the distance in the latent space data $\{\mathbf{x}_{s,i} \}$, is 10.97 and 9.95 for left and right instruments, respectively. The adaptive bandwidth was determined using Eqs. (5) and (6). Figure 5 shows the adaptive bandwidth for one of the trials. The spatial data in the latent space $\{\mathbf{x}_{s,i} \}$ were then grouped in clusters using the adaptive mean shift method described in “Adaptive mean shift clustering of motion trajectory” section.

The GMM method described in “Gaussian mixture model” section was applied to model the spatial data $\{\mathbf{x}_{s,i} \}$, and the parameters of the GMM model were estimated by the Expectation Maximization algorithm [26]. Figure 6a, c shows the Gaussian Mixture Models trained with the motion primitives identified by the adaptive mean shift method. Eight and thirteen primitives were identified in the left and right instruments trajectories. The estimated parameters were for the data set in the latent space. For GMR regression process, they were projected back into the original space by Eq. (9).

The GMR method described in “Gaussian mixture regression” section was then applied to reconstruct the trajectories in the original space. Figure 6b, d shows the GMR regression results of the GMM models which were trained to encode the surgical skills demonstrated. Figure 7 shows the 3D plot of the tissue division task with the demonstration trajectories and the reconstructed trajectories. The implementation of the GMM and the GMR is based on a Gaussian mixture tool kit [8] available in the public domain.

In order to further evaluate the robustness of the proposed method, the method was applied to model a surgical task of deploying a clip with laparoscopic instruments in laparoscopic cholecystectomy using the system described in “Surgical simulation system for the experiments” section. In the experiment, the left instrument was used to grab and hold the gallbladder, while the right instrument approached the cystic duct and deployed a clip. The surgical task was carried out by Subjects 2 and 3 with the same virtual patient setup. Each subject repeated the task a number of times. Twenty-two and 24 trajectories were recorded from Subjects 2 and 3, respectively. Figures 8 and 9 show the raw motion trajectory data and the mean reconstructed model of Subjects 2 and 3, respectively. Comparing the mean reconstructed model of each subject’s left instrument, we can notice that each subject manipulated the instruments differently; Subject 2 tends to focus on controlling the span of instrument swing more closely than that of Subject 3.

Discussion

We have applied the adaptive mean shift method to identify the motion primitives. The adaptive mean shift method provides an intuitive way in determining the number of motion primitives based on the initial bandwidth, which was obtained by the plug-in-rule method [20]. However, the performance of the adaptive mean shift method relies on its initial bandwidth and the adaptive bandwidth function [24].

Root Mean Square (RMS) error was applied to evaluate the quality of the motion model through the reconstructed motion model [28]. RMS error of the reconstructed trajectory with respect to the demonstrated trajectory after DTW was calculated as follows:

$$\begin{aligned} \hbox {RMS}=\sqrt{\frac{1}{MN}\sum \limits _{j=1}^M {\sum \limits _{i=1}^N {(\widehat{\mathbf{X}}_{s,i} -\mathbf{X}_{s,i})}}^{2}}, \end{aligned}$$

(14)

where $M$ is the number of trials, $N$ is the number of observations in each trial. $\widehat{\mathbf{X}}_s$ and $\mathbf{X}_s$ are the expected spacial components and the spacial components from the demonstrations, respectively.

We compared the quality of the motion model obtained based on different methods in identifying the motion primitives, i.e., adaptive bandwidth mean shift method, fixed bandwidth mean shift method and k-means method. The number of primitives required for k-means method was determined from the tests using adaptive mean shift method. Figure 10a–d shows the GMM modeling results based on k-means method and fixed bandwidth mean shift clustering method. The comparison with Fig. 6 reveals that our method has captured motion primitives with more focused Gaussian components than that of k-means and fixed bandwidth-based methods. The fixed bandwidth mean shift identified 6 and 14 primitives from the motion trajectory of left and right instruments, respectively. Although the three methods employed similar number of motion primitives, we observed from Table 1 that the Gaussian Mixture Models with adaptive mean shift method produced smaller RMS error comparing with that of the K-means method and fixed bandwidth methods.

Table 1 The RMS error (rotational joints) of the reconstructed trajectory to the demonstrations after DTW

Full size table

The adaptive mean shift method also showed advantages in preserving the dexterous features in motion. For example, the handle motion (Fig. 4d) showed several open and close actions. These features have been encoded and reconstructed by the GMM/GMR with adaptive mean shift method, as shown in Fig. 6c, d. However, these features were not captured in GMM with k-means and fixed bandwidth methods (Fig. 10b, d), even the number of primitives used for k-means method was same as the number of primitives obtained by the adaptive mean shift method, and the fixed bandwidth method obtained similar number of primitives with the adaptive mean shift method.

Another advantage of the Gaussian mixture modeling method based on the adaptive mean shift method is that it does not need to specify the number of the Gaussian components. While it is possible to have a better fit of the trajectory with a high number of Gaussian components, this will be at the expense of poor generalization capability and potential risks of over fitting.

PCA is necessary in the analysis of the motion trajectory data. The PCA can be applied for reducing the dimensionality and the noise and also to rotate the data to the axis that allows the clustering algorithm to identify the motion primitives effectively. When dimensionality reduction is not required for the data set, PCA is necessary to rotate the data set according to the eigenvector of covariance matrix of the data set and to align the data in its principal axis. The tissue division trajectories were applied with the adaptive mean shift method directly without PCA and the identified motion primitives were used to train the GMM models. Figure 11 shows trained GMM modeling and GMR regression results. The data across large time spans were grouped in the same motion primitive which significantly reduced the capability of the Gaussian Mixture Regression. Table 2 shows that the RMS error of the reconstructed trajectory from the demonstrations after DTW is larger than that of with the PCA analysis. Therefore, the PCA is an important component of the solution.

Table 2 Effect of PCA on RMS error (rotational joints) of the reconstructed trajectory to the demonstrations after DTW

Full size table

Our approach is suitable for modeling of surgical skills with a specific sequence of motion primitives, such as the division and clipping tasks modeled in this study. Both tasks require grabbing and holding onto the object first, before performing the task at certain locations. While performing the task, the pattern of opening and closing of instrument handle is consistent among the user’s executions. Clear motion sequences can be identified from the user’s demonstration. Surgical suturing could be modeled by the proposed method, as it required both hands to conduct the motion in sequences. Surgical operations in which the sequence of motion is not critical may not be represented by Gaussian Mixture model effectively. The method focuses on extraction and reconstruction of a generic model from demonstrations conducted by the user. It does not include the collaboration between two instruments and tool–tissue interaction. In order to consider these factors, the velocity of each instrument and the deformation of organ or tissue have to be modeled. Although the robustness of our method was evaluated with different surgical tasks, the study was limited by the size of sample, the complexity, and range of surgical procedures and devices. More surgical procedures should be studied in the future to demonstrate the generalizability of the proposed method. The future studies could be conducted in collaboration with other teaching hospitals in Singapore.

Conclusion

Learning from experienced surgeons is an efficient way of transferring surgical skills from the senior surgeons to the surgical trainees. The method of learning by demonstration is an approach to model the surgical skills and facilitate it for surgical training from the perspective of motion trajectory. The trained motion model in learning by demonstration approach can serve as a generic model representing surgical skills. The motion model can then be used by the robots to provide guidance to the trainee. Experimentation of our robotic surgical training system and the underlying technology with trainee doctors is currently ongoing.

The method proposed in this study demonstrates the feasibility of modeling skills without specifying number of motion primitives. This has contributed to the robustness of our robotic surgical training system. Adaptive mean shift method has been applied to identify the motion primitives, and the Gaussian Mixture Models is trained by demonstrations to represent a surgical skill. However, collaboration from multiple instruments is essential in the execution of many surgical tasks. We are developing collaborative models to represent the cooperation of multiple surgical instruments which is beyond the scope of this paper. The various spatial and temporal constraints in surgery also have to be taken into consideration for a complete simulation of a surgical operation. For example, in situations where a certain location/obstruction has to be avoided, or a specific location that must be passing through in order to reach the targeted site, constraints dependent on individual patient anatomy have to be considered.

References

Winer J, Can MF, Bartlett DL, Zeh HJ, Zureikat AH (2012) The current state of robotic-assisted pancreatic surgery. Nat Rev Gastroenterol Hepatol. doi:10.1038/nrgastro.2012.120
Kwartowitz D, Herrell SD, Galloway R (2006) Toward image-guided robotic surgery: determining intrinsic accuracy of the da Vinci robot. Int J Comput Assist Radiol Surg 1(3):157–165. doi:10.1007/s11548-006-0047-3
Article Google Scholar
Murphy DG, Hall R, Tong R, Goel R, Costello AJ (2008) Robotic technology in surgery: current status in 2008. ANZ J Surg 78(12):1076–1081. doi:10.1111/j.1445-2197.2008.04754.x
Article PubMed Google Scholar
Liu J, Cramer SC, Reinkensmeyer DJ (2006) Learning to perform a new movement with robotic assistance: comparison of haptic guidance and visual demonstration. J Neuroeng Rehabil 3:20. doi:10.1186/1743-0003-3-20
Article PubMed CAS PubMed Central Google Scholar
Schaal S (1999) Is imitation learning the route to humanoid robots? Trends Cogn Sci 3(6):233–242
Article PubMed Google Scholar
Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Rob Auton Syst 57(5):469–483. doi:10.1016/j.robot.2008.10.024
Article Google Scholar
Inamura T, Kojo N, Sonoda T, Sakamoto K, Okada K, Inaba M (2005) Intent imitation using wearable motion capturing system with on-line teaching of task attention. In: 5th IEEE-RAS international conference on humanoid robots, December 5–5 2005, pp 469–474. doi:10.1109/ichr.2005.1573611
Calinon S, Guenter F, Billard A (2007) On learning, representing, and generalizing a task in a humanoid robot. IEEE Trans Syst Man Cybern Part B Cybern 37(2):286–298
Article Google Scholar
Yamane K, Yamaguchi Y, Nakamura Y (2011) Human motion database with a binary tree and node transition graphs. Auton Robots 30(1):87–98
Article Google Scholar
Reiley CE, Plaku E, Hager GD (2010) Motion generation of robotic surgical tasks: learning from expert demonstrations. In: 2010 annual international conference of the IEEE engineering in medicine and biology society (EMBC), August 31 2010–September 4 2010, pp 967–970
Niessen W, Viergever M, Thakral A, Wallace J, Tomlin D, Seth N, Thakor N (2001) Surgical motion adaptive robotic technology (S.M.A.R.T): taking the motion out of physiological motion. In: Medical Image Computing and Computer-Assisted Intervention â€“ MICCAI 2001, vol 2208. Lecture Notes in Computer Science. Springer, Berlin Heidelberg, pp 317–325. doi:10.1007/3-540-45468-3_38
Pagador JB, Sanchez-Margallo FM, Sanchez-Peralta LF, Sanchez-Margallo JA, Moyano-Cuevas JL, Enciso-Sanz S, Uson-Gargallo J, Moreno J (2011) Decomposition and analysis of laparoscopic suturing task using tool-motion analysis (TMA): improving the objective assessment. Int J Comput Assist Radiol Surg 7(2):305–313. doi:10.1007/s11548-011-0650-9
Article PubMed Google Scholar
Lin HC, Shafran I, Yuh D, Hager GD (2006) Towards automatic skill evaluation: detection and segmentation of robot-assisted surgical motions. Comput Aided Surg 11(5):220–230. doi:10.3109/10929080600989189
Article PubMed Google Scholar
Hermann M, Faustino G, Daan W, Istvan N, Alois K, Jurgen S (2006) A system for robotic heart surgery that learns to tie knots using recurrent neural networks. In: IEEE/RSJ international conference on intelligent robots and systems, October 2006, pp 543–548. doi:10.1109/iros.2006.282190
Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697. doi:10.1016/j.neunet.2008.02.003
Article PubMed Google Scholar
Kober J, Peters J (2010) Imitation and reinforcement learning. IEEE Robot Autom Mag 17(2):55–62. doi:10.1109/mra.2010.936952
Article Google Scholar
Kormushev P, Calinon S, Caldwell DG (2010) Robot motor skill coordination with EM-based reinforcement learning. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 18–22 October 2010. pp 3232–3237. doi:10.1109/iros.2010.5649089
Thobbi A, Weihua S (2010) Imitation learning of hand gestures and its evaluation for humanoid robots. In: IEEE international conference on information and automation (ICIA), 20–23 June 2010. pp 60–65. doi:10.1109/icinfa.2010.5512333
Reiley CE, Lin HC, Varadarajan B, Vagvolgyi B, Khudanpur S, Yuh DD, Hager GD (2008) Automatic recognition of surgical motions using statistical modeling for capturing variability. Stud Health Technol Inform 132:396–401
PubMed Google Scholar
Sheather SJ, Jones MC (1991) A reliable data-based bandwidth selection method for kernel density estimation. J R Stat Soc 53(3):683–690. doi:10.2307/2345597
Google Scholar
Mugdadi AR, Ahmad IA (2004) A bandwidth selection for kernel density estimation of functions of random variables. Comput Stat Data Anal 47(1):49–62
Article Google Scholar
Horová I, Kolácek J, Zelinka J, Vopatová K (2008) Bandwidth choice for kernel density estimates. In: 6th conference of the asian regional section of the IASC, Yokohama Japan
Wand MP, Jones MC (1995) Kernel smoothing. Monographs on statistics and applied probability 60. Chapman & Hall, Londong
Google Scholar
Comaniciu D, Ramesh V, Meer P (2001) The variable bandwidth mean shift and data-driven scale selection. In: 8th IEEE international conference on computer vision, vol 431, pp 438–445
Horová I, Kolácek J, Vopatová K, Full bandwidth matrix selectors for gradient kernel density estimate. Comput Stat Data Anal 57 (1):364–376
Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16. doi:10.1109/massp.1986.1165342
Article Google Scholar
Yang T, Liu J, Huang W, Su Y, Yang L, Chui C, Ang M, Jr., Chang SY (2012) Mechanism of a learning robot manipulator for laparoscopic surgical training. In: Intelligent autonomous systems 12, vol 194. Advances in Intelligent Systems and Computing. Springer, Berlin Heidelberg, pp 17–26. doi:10.1007/978-3-642-33932-5_3
Calinon S, D’Halluin F, Sauser EL, Caldwell DG, Billard AG Learning and reproduction of gestures by imitation. Rob Autom Mag IEEE 17(2):44–54. doi:10.1109/mra.2010.936947

Download references

Acknowledgments

This work is partially supported by research Grant BEP 102 148 0009, Image-guided Robotic Assisted Surgical Training from the Agency for Science, Technology and Research, Singapore.

Conflict of interest

All authors declare that they have no conflicts of interest.

Author information

Authors and Affiliations

Neural and Biomedical Technology Department, Institute for Infocomm Research, Singapore, Singapore
Tao Yang & Weimin Huang
Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
Tao Yang & Chee Kong Chui
Ocular Imaging Programme, Institute for Infocomm Research, Singapore, Singapore
Jiang Liu
Department of Computing Science, Institute of High Performance Computing, Singapore, Singapore
Yi Su
Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Stephen K. Y. Chang

Authors

Tao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Chee Kong Chui
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Su
View author publications
You can also search for this author in PubMed Google Scholar
Stephen K. Y. Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, T., Chui, C.K., Liu, J. et al. Robotic learning of motion using demonstrations and statistical models for surgical simulation. Int J CARS 9, 813–823 (2014). https://doi.org/10.1007/s11548-013-0967-7

Download citation

Received: 21 August 2013
Accepted: 19 November 2013
Published: 14 December 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11548-013-0967-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robotic learning of motion using demonstrations and statistical models for surgical simulation