Abstract
Task-parameterized models of movements aim at automatically adapting movements to new situations encountered by a robot. The task parameters can, for example, take the form of positions of objects in the environment or landmark points that the robot should pass through. This tutorial aims at reviewing existing approaches for task-adaptive motion encoding. It then narrows down the scope to the special case of task parameters that take the form of frames of reference, coordinate systems or basis functions, which are most commonly encountered in service robotics. Each section of the paper is accompanied by source codes designed as simple didactic examples implemented in Matlab with a full compatibility with GNU Octave, closely following the notation and equations of the article. It also presents ongoing work and further challenges that remain to be addressed, with examples provided in simulation and on a real robot (transfer of manipulation behaviors to the Baxter bimanual robot). The repository for the accompanying source codes is available at http://www.idiap.ch/software/pbdlib/.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In contrast to industrial robots in large factories, a wide range of service robots are designed to move in unconstrained environments in which they should fulfill a series of tasks while swiftly reacting to perturbations. The expectations and promises of service robotics applications are very challenging and cannot be achieved without joint efforts from different fields of robotics. This exploitation of various methods can hardly be done serially and instead requires closer interactions between learning, planning and control. One of the prior requirement to face such challenge is to design a versatile representation of what the robot should do (how it should move, which behavior it should follow) that is compatible with the above techniques and that can be shared bilaterally. In particular, in continuously changing environments, the movements of service robots need to be generated and adapted to the ongoing situation very quickly.
This tutorial takes the perspective that the challenges of recognizing, predicting and generating movements can be achieved within the same encoding strategy. It will show that simple probabilistic mixture models can be exploited to model the natural variations in human and robot motions, as well as to make links between learning, online planning and optimal control. Gaussian mixture models provide a structure that is compatible with many robot learning approaches. It is flexible to the requirements of service robotics, because the representation can be easily adapted to the application requirements while preserving the core probabilistic mixture modeling strategy (addition of transition information in the form of an HMM, subspace clustering with MFA or MPPCA, etc.). Finally, the model is not tied to a specific parameter estimation technique, which allows the movements to be acquired by different interaction modalities and learning strategies.
The tutorial will focus on task-parameterized Gaussian mixture model (TP-GMM), by presenting a number of extensions and ongoing challenges targeting applications in unconstrained environment.
1.1 Organization of the paper
Section 2 discusses the importance of considering adaptive models of movements in robotics. It introduces the proposed approach from a high-level perspective and motivates it using a toy example with a single Gaussian.
Section 3 presents the core of the approach by taking the example of a standard Gaussian mixture model (GMM) modified as a task-parameterized GMM (TP-GMM). It also discusses the practical use of regularization terms.
The next four sections present techniques that rely on the core TP-GMM encoding strategy but that tackle different challenges, by moving progressively from encoding issues to kinematic and dynamic retrieval of data. Section 4 addresses the challenge of handling high-dimensional data with subspace clustering extensions of the model. Section 5 discusses the challenge of generating continuous movements from task-parameterized models. It presents two distinct approaches based on Gaussian mixture regression (Sect. 5.1) or trajectory models encoding of dynamic features (Sect. 5.2).
Section 6 then extends the motion synthesis challenge to the important problem of learning a controller for the robot, by presenting a minimal intervention control strategy that can exploit the proposed task parameterized model.
Section 7 gives an overview of more advanced forms of task parameterization that can be considered, such as constraints in different data spaces (e.g., to handle constraints at joint and end-effector levels simultaneously), as well as priority and projection constraints.
Finally, Sect. 8 presents comparisons with other task-adaptive approaches, and Sect. 9 discusses further work.
Each section of the paper is accompanied by source codes designed as simple didactic examples implemented in Matlab/GNU Octave. In order to facilitate reading, implementation details such as estimation update rules have been gathered at the end of the paper in the form of Appendices.
2 Adaptive models of movements
Task-parameterized models of movements/behaviors refer to representations that can automatically adapt to a set of task parameters that can, for example, describe the current context, situation, state of the environment or state of the robot configuration. The task parameters refer to the variables that can be collected by the system and that describe a situation, such as positions of objects in the environment. The task parameters can be fixed during an execution trial or they can vary while the motion is executed. The model parameters refer to the variables learned by the system, namely that are stored in memory (the internal representation of the movement). During reproduction, a new set of task parameters (description of the present situation) is used to produce new movements (e.g., adaptation to new position of objects after having observed the skill in a different situation).
Several denominations have been introduced in the literature to describe these models, such as task-parameterized [20, 64, 91] (the denomination used here), parametric [52, 60, 102], stylistic [13] or object-centric warping [57]. In these models, the encoding of skills usually serve several purposes, including classification, prediction, synthesis and online adaptation. A taxonomy of task-parameterized models is presented in [14], with three broad categories, namely
-
1.
approaches employing M models for the M demonstrations, performed in M different situations, see, e.g., [21, 29, 44, 48, 51, 60, 97],
-
2.
approaches employing P models for the P frames of reference that are possibly relevant for the task, see, e.g., [3, 25, 66], and
-
3.
approaches employing a single model whose parameters are modulated by task parameters, see, e.g., [43, 52, 71, 72, 80, 102].
In the majority of these approaches, the retrieval of movements from the model parameters and the task parameters is viewed as a regression problem. This generality might look appealing at first sight, but it also limits the generalization scope of these models, see Fig. 1. Task-parameterized Gaussian mixture models (TP-GMM) aim at increasing this generalization capability by exploiting the functional nature of task parameters. Indeed, in robotics applications, task parameters can most of the time be related to frames of reference, coordinate systems, basis functions or local projections, whose structure can be exploited to speed up learning and provide the system with better extrapolation capability.
2.1 Proposed approach
The proposed approach uses a generative model to encode the movement, where the variability and correlation information is used to infer the impedance parameters of a virtual spring–damper system. These parameters figuratively correspond to the stiffness of a spring and to the damping coefficient of a viscous damper, with the difference that they can also be full stiffness and damping matrices.
In its task-parameterized version, the model uses several frames of reference to describe the robot behavior in multiple coordinate systems. The variations and correlations observed from the perspective of these different frames are exploited to determine the impedance of the system with a linear quadratic regulator. Figure 2 illustrates the overall approach, which can be decomposed into multiple steps, involving statistical modeling, dynamical systems and optimal control. This illustration will be used as a guiding thread to describe throughout the article the different model components and algorithms enabling learning, adaptation, synthesis and control of movement skills.
The proposed task-parameterized model is not new: preliminary versions were investigated in [14, 16, 20] for the special case of frames of reference representing rotations and translations in Cartesian space. The current paper discusses the potentials of the approach, and introduces several routes for further investigation, which aim at applying the proposed technique to a wider range of affine transformations (directly exploiting the robotics application domain), including constraints in both configuration and operational spaces, as well as priority constraints. It also shows that the proposed method can be applied to different probabilistic encoding strategies, including subspace clustering approaches that enable the model to handle feature spaces of high dimensions.
Table 1 will be used as a reference to the notation, dimensions and names of variables employed in the paper. As a general rule, lowercase and uppercase bold fonts, respectively, indicate vectors and matrices, while normal fonts indicate scalars. Table 2 lists all examples available as Matlab/GNU Octave source codes.
2.2 Example with a single Gaussian
Before presenting the details of the task-parameterized model, the approach is motivated by an introductory example with a single Gaussian. Two frames are considered, described respectively at each time step t by \(\{{\varvec{b}}_{t,1},{\varvec{A}}_{t,1}\}\) and \(\{{\varvec{b}}_{t,2},{\varvec{A}}_{t,2}\}\), representing the origin of the observer \({\varvec{b}}\) and a set of basis vectors \(\{{\varvec{e}}_1,{\varvec{e}}_2,\ldots \}\) forming a transformation matrix \({\varvec{A}}=[{\varvec{e}}_1 {\varvec{e}}_2 \ldots ]\).
A set of demonstrations is observed from the perspective of the two frames. During reproduction, each frame expects the new datapoints to lie within the same range. If \(\mathcal {N}\Big ({\varvec{\mu }}^{(1)},{\varvec{{\varSigma }}}^{(1)}\Big )\) and \(\mathcal {N}\Big ({\varvec{\mu }}^{(2)},{\varvec{{\varSigma }}}^{(2)}\Big )\) describe the observations in the first and second frames; the two observers, respectively, expect the reproduction attempts to lie within the distributions \(\mathcal {N}\Big ({\varvec{\hat{\xi }}}{}_{t}^{(1)},\, {\varvec{\hat{{\varSigma }}}}{}_{t}^{(1)} \Big )\) and \(\mathcal {N}\Big ({\varvec{\hat{\xi }}}{}_{t}^{(2)},\, {\varvec{\hat{{\varSigma }}}}{}_{t}^{(2)} \Big )\) with
computed with the linear transformation property of normal distributions.
During reproduction, a trade-off needs to be determined to concord with the distributions expected by each frame. The underlying objective function is defined as the weighted sum of the quadratic error terms
The above objective can be solved easily by differentiating and equating to zero the above equation, yielding a point \({\varvec{\hat{\xi }}}_{t}\), with an estimation error defined by a covariance \({\varvec{\hat{{\varSigma }}}}_{t}\). It is easy to show that the resulting \(\mathcal {N}\Big ( {\varvec{\hat{\xi }}}_{t},{\varvec{\hat{{\varSigma }}}}_{t} \Big )\) corresponds to the product of the two Gaussians \(\mathcal {N}\Big ({\varvec{\hat{\xi }}}{}_{t}^{(1)},\; {\varvec{\hat{{\varSigma }}}}{}_{t}^{(1)} \Big )\) and \(\mathcal {N}\Big ({\varvec{\hat{\xi }}}{}_{t}^{(2)},\; {\varvec{\hat{{\varSigma }}}}{}_{t}^{(2)} \Big )\), see [18] for details.
Figure 3 illustrates this process for one of the Gaussian in Fig. 2.
3 Task-parameterized Gaussian mixture model (TP-GMM)
The task-parameterized Gaussian mixture model (TP-GMM) is a direct extension of the objective problem presented above, by considering multiple frames and multiple clusters of datapoints (soft clustering via mixture modeling). It probabilistically encodes the relevance of candidate frames, which can change during the task. In contrast to approaches such as [71] that aim at extracting a single (most prominent) coordinate system located at the end of a motion segment, the proposed approach allows the superposition and transition of different coordinate systems that are relevant for the task (parallel organization of behavior primitives, adaptation to multiple viapoints in the middle of the movement, or modulation based on positions, orientations or geometries of objects).
Each demonstration \(m\in \{1,\ldots ,M\}\) contains \(T_m\) datapoints forming a dataset of N datapoints \(\{{\varvec{\xi }}_{t}\}_{t=1}^N\) with \(N=\sum _{m}^{M} T_m\).
The task parameters are represented by P coordinate systems, defined at time step t by \(\{{\varvec{b}}_{t,j},{\varvec{A}}_{t,j}\}_{j=1}^P\), representing, respectively, the origin of the observer and a transformation matrix.
The demonstrations \({\varvec{\xi }}\in \mathbb {R}^{D\times N}\) are observed from these different viewpoints, forming P trajectory samples \({\varvec{X}}^{(j)}\in \mathbb {R}^{D\times N}\). These samples can be collected from sensors located at the frames or computed with
The parameters of a TP-GMM with K components are defined by \(\big \{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{{\varSigma }}}^{(j)}_i\}_{j=1}^P\big \}_{i=1}^K\) (\(\pi _i\) are the mixing coefficients, \({\varvec{\mu }}^{(j)}_i\) and \({\varvec{{\varSigma }}}^{(j)}_i\) are the center and covariance matrix of the ith Gaussian component in frame j).
Learning of the parameters is achieved by log-likelihood maximization subject to the constraint that the data in the different frames arose from the same source, resulting in an expectation-maximization (EM) algorithm [23] to iteratively update the model parameters until convergence, see “Appendix 1” for details. Other forms of learning for mixture models are possible, including spectral clustering [53, 69, 84], online learning [27, 34, 68, 83, 99] or self-refinement [19].
For a movement in Cartesian space with 10 demonstrations and 3 candidate frames, the overall learning process typically takes 1–3 s on a standard laptop. The reproduction is much faster and can be computed online (usually below 1 ms).
The learned model is then used to reproduce movements in other situations (for new position and orientation of candidate frames). A new GMM with parameters \(\{\pi _i,{\varvec{\hat{\xi }}}_{t,i},{\varvec{\hat{{\varSigma }}}}_{t,i}\}_{i=1}^K\) can automatically be generated with
where the result of the Gaussian product is given by
For computational efficiency, the above equations can be computed with precision matrices instead of covariances. Figure 4 depicts the different steps of the above computation.
The proposed task-parameterized approach requires each frame to evaluate the local variability of the demonstrations. This section showed that a mixture model could be employed to extract this variability. However, other encoding strategies can be used as long as the local variations take the form of full covariances aligned with the different frames. In particular, data-driven encoding strategies can alternatively be employed. This will be shown later in Sect. 8 using two examples with task-parameterized Gaussian process (TP-GP) and task-parameterized locally weighted regression (TP-LWR).
3.1 Regularization of the TP-GMM parameters
In applications that are prone to overfitting, it is relevant to introduce regularization terms. Regularization has the effect of avoiding singularities and smoothing the solution space. An option is to define a minimal admissible eigenvalue \(\lambda _{\min }\) and adjust each covariance matrix \({\varvec{{\varSigma }}}^{(j)}_i\) so that
where \({\varvec{V}}^{(j)}_i\) is a matrix containing the stacked eigenvectors of \({\varvec{{\varSigma }}}^{(j)}_i\), with \(\lambda ^{(j)}_{i,k}\) the corresponding eigenvalues.
Another approach is to set a priori uncertainties on the covariance parameters in the form of a diagonal isotropic covariance \({\varvec{I}}\rho \) (Tikhonov regularization), so that
with \({\varvec{I}}\) an identity matrix and \(\rho \) a small scalar factor that can be either set empirically or estimated from the data. The difference with Eq. (7) is that a value \(\rho \) is added to each \(\lambda ^{(j)}_{i,k}\) instead of truncating the eigenvalues. The same development can be done with singular value decomposition, emphasizing the effect of the regularization on the condition number, by forcing it to be higher than a threshold as in Eq. (7) or by increasing the singular values as in Eq. (8). It is in some applications convenient to apply small regularization terms at different steps of the procedure (e.g., at each iteration in the EM process and after convergence before computing Gaussian products).
4 Extension to task-parameterized subspace clustering
Classical Gaussian mixture models tend to perform poorly in high-dimensional spaces if too few datapoints are available. This is also true for robotics problems aiming at encoding multivariate and multimodal signals from only few demonstrations. Namely, if the training set is \(\{{\varvec{\xi }}_t\}_{t=1}^N\) with \({\varvec{\xi }}_t\in \mathbb {R}^D\), the curse of dimensionality occurs if the dimension of the data D is too large compared to the size of the training set N. In particular, the problem can affect the full covariances \({\varvec{{\varSigma }}}^{(j)}_i\in \mathbb {R}^{D\times D}\) in (52) because the number of parameters to be estimated quadratically grows with D.
Bouveyron and Brunet reviewed various ways of viewing the problem and coping with high-dimensional data in clustering problems [11]. In practice, three viewpoints can be considered:
-
1.
Since D is too large compared to N, a global dimensionality reduction should be applied as a pre-processing step to reduce D.
-
2.
Since D is too large compared to N, the solution space contains many poor local optima; the solution space should be smoothed by introducing ridge or lasso regularization in the estimation of the covariance (avoiding numerical problem and singular solutions when inverting the covariances). As discussed in Sect. 3.1, a simple form of regularization can be achieved after the maximization step of each EM loop.
-
3.
Since D is too large compared to N, the model is probably over-parametrized, and a more parsimonious model should be used (thus estimating a fewer number of parameters).
One example falling in the last category would be to consider spherical or diagonal covariances instead of full matrices, corresponding to a separate treatment of each variable. Although commonly employed in robotics, such decoupling is a limiting factor to encode gestures and sensorimotor streams, because it does not fully exploit principles underlying coordination, motor skill acquisition and actionperception couplings [41, 46, 54, 67, 81, 86, 94, 103].
Our rationale is that diagonal constraints are too strong for motor skill encoding, because it loses important synergistic information among the variables. There are, however, a wide range of alternatives in mixture modeling, which are in-between the encoding of diagonal and full covariances and that can readily be exploited in the context of robot skills acquisition. These alternatives can be studied as a subspace clustering problem that aims at grouping the data such that they can be locally projected in a subspace of reduced dimensionality, thus helping the analysis of the local trend of the movement, while reducing the number of parameters to be estimated, and “locking” the most important synergies to cope with perturbations.
Many possible constraints can be considered, grouped in families such as parsimonious GMM [7, 11, 62], mixtures of factor analyzers (MFA) [61] or mixtures of probabilistic principal component analyzers (MPPCA) [93]. These techniques will next be described in the context of task-parameterized models.
4.1 Parsimonious TP-GMM
By following the perspective of Sect. 3.1, a parsimonious TP-GMM can be defined by considering the spectral decomposition of the covariances
with \({\varvec{V}}^{(j)}_i\) a matrix of ordered eigenvectors (determining the orientation of the cluster) and \({\varvec{D}}^{(j)}_i\) a diagonal matrix with ordered eigenvalues \(\lambda ^{(j)}_{i,k}\) (determining the shape of the cluster), where constraints are set by sharing some of these elements among the clusters, and/or by keeping only the first d eigenvectors and eigenvalues in the parameterization.
The high-dimensional data clustering (HDDC) approach from [12] lies in this category of models addressing both subspace clustering and regularization. An example of implementation is to consider that the subspace of each cluster i is generated by the first \(d_i\) eigenvectors associated with the first \(d^{(j)}_i\) eigenvalues \(\lambda ^{(j)}_{i,k}\), and that outside of this subspace, the variance is spherical, modeled by a single parameter
which is used to reconstruct a full covariance matrix by replacing the last \(D-d^{(j)}_i\) eigenvalues with \(\bar{\lambda }^{(j)}_i\).
4.2 Task-parameterized mixture of factor analyzers (TP-MFA)
Factor analysis (FA) is an approach as old as principal component analysis (PCA) to cope with dimension reduction, often overshadowed by PCA although is has an equivalently important literature on the topic [12]. The basic idea of factor analysis is to reduce the dimensionality of the data while keeping the observed covariance structure, see [26] for an example of application in robotics.
The TP-GMM presented in Sect. 3 is fully compatible with subspace clustering approaches based on factor analysis. A task-parameterized mixture of factor analyzers (TP-MFA) assumes for each component i and frame j a covariance structure of the form
where \({\varvec{{\varLambda }}}^{(j)}_i\in \mathbb {R}^{D\times d}\), known as the factor loadings matrix, typically has \(d<D\) (providing a parsimonious representation of the data) and a diagonal noise matrix \({\varvec{{\varPsi }}}^{(j)}_i\).
The factor loading and noise terms of the covariance matrix can be constrained in different ways (e.g., such as being shared across Gaussian components), yielding a collection of eight parsimonious covariance structures [62]. For example, the task-parameterized mixture of probabilistic principal component analyzers (TP-MPPCA) [93] is a special case of TP-MFA with the distribution of the errors assumed to be isotropic with \({\varvec{{\varPsi }}}^{(j)}_i={\varvec{I}}{\sigma ^{(j)}_i}^2\).
Figure 5 shows that the covariance structure in MFA can span a wide range of covariances.
“Appendix 2” details the structure of TP-MFA and provides an EM algorithm to estimate the model parameters.
The hypothesis of TP-MFA models can be viewed as less restrictive as TP-HDDC models based on eigendecomposition (see Sect. 4.1), because the subspace of each class does not need to be spanned by orthogonal vectors, whereas it is a necessary condition in models based on eigendecomposition [12].
Similarly to parsimonious GMM based on eigendecomposition, the covariances in TP-MFA can be constrained by fixing d or by sharing elements among the mixture components. This encoding strategy can then be extended to variants of MFA aiming at optimizing the sharing and re-use of subspaces among the Gaussian components, such as in semi-tied covariance [31]. These techniques can be exploited to extend the concept of synergies to a wider range of rich motor skills, with a simultaneous segmentation and re-use of previously discovered synergies.
For each approach, a dedicated EM update can be derived corresponding to the type of constraints considered [62]. They all reconstruct estimates of the full covariances, which is an important characteristic that will be exploited in the next sections of this article.
The TP-MFA extension of TP-GMM opens several roads for further investigation. Bayesian nonparametric approaches such as [101] can be used to simultaneously select the number of clusters and the dimension of the subspace in each cluster. Another extension is to use tied structures in the covariances to enable the organization and reuse of previously acquired synergies [31].
Another possible extension is to enable deep learning strategies in task-parameterized models. As discussed in [92], the prior of each FA can be replaced by a separate second-level MFA that learns to model the aggregated posterior of that FA (instead of the isotropic Gaussian), providing a hierarchical structure organization where one layer of latent variables can be learned at a time. This can be exploited as a link with deep learning strategies [9, 40] for real-valued high-dimensional data within directed graphical models.
Figure 6 shows a kinematic example with TP-MFA used for encoding and synthesis purposes.
5 Extension to motion synthesis
While the previous sections focused only on TP-GMM as an encoding strategy, this section addresses the problem of generating movements from the model.
Several approaches can be used to retrieve continuous movements from a TP-GMM. The next two subsections provide examples for two different synthesis techniques. The first technique is to encode a decay term or a time variable as an additional feature in the mixture and use Gaussian mixture regression (GMR) [33] to retrieve movements adapted to the current situations. The second technique is to encode both static and dynamic features in the mixture model as in trajectory-HMM [30, 89, 95, 106]. These two techniques are described next.
5.1 Gaussian mixture regression (GMR)
With a GMM representation, the reproduction of a movement can be formalized as a regression problem [33]. We showed in [17, 18] that in robot learning, Gaussian mixture regression (GMR) offers a simple solution to generate continuous movements from a GMM. GMR relies on basic properties of normal distributions (linear transformation and conditioning). It provides a probabilistic retrieval of movements or policies, in which the model can compute the next actions on-the-fly, with a computation time that is independent of the number of datapoints used to train the model.
In contrast to other regression methods such as locally weighted regression (LWR) [79], locally weighted projection regression (LWPR) [100] or Gaussian process regression (GPR) [35, 70, 75], GMR does not model the regression function directly. It models the joint probability density function of the data and then derives the regression function from the joint density model, see [88] for an excellent review of regression approaches. The estimation of the model parameters is thus achieved in an offline phase that depends linearly on the number of datapoints. Regression is then independent of this number and can be computed very rapidly, which makes the approach an interesting alternative to regression methods whose processing grows with the size of the dataset. The other benefit is that both input and output variables can be multidimensional without modification of the model.
GMR can, for example, be employed in robot applications requiring input and output dimensions to be specified at run time (e.g., to handle missing sensory inputs or to react swiftly by retrieving partial outputs).
The superscripts \(\mathcal {I}\) and \(\mathcal {O}\) will be further used to describe the dimensions that span for input and output (used as exponents for vectors and matrices). The general case of a GMM encoding a dataset \({\varvec{\xi }}\) with the joint distribution \(\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}},{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}})\sim \sum _{i=1}^K\pi _i\;\mathcal {N}({\varvec{\mu }}_i,{\varvec{{\varSigma }}}_i)\) will first be described, which will later be extended to its task-parameterized version.
At each iteration step t, the datapoint \({\varvec{\xi }}_t\) can be decomposed as two subvectors \({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t\) and \({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t\) spanning for the input and output dimensions. For trajectory encoding in task space, \(\mathcal {I}\) corresponds to the time input dimension (e.g., value of a decay term), and \(\mathcal {O}\) corresponds to the output dimensions describing a path (e.g., end-effector position in task space).
With this notation, a block decomposition of the datapoint \({\varvec{\xi }}_t\), vectors \({\varvec{\mu }}_i\) and matrices \({\varvec{{\varSigma }}}_i\) can be written as
At each time step t during reproduction, \(\mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t|{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t)\) is computed as the conditional distribution
Note that Eq. (13) represents a multimodal distribution. For problems in which a single peaked output distribution is preferred, Eq. (13) can be approximated by a normal distribution (see “Appendix 3” for details of computation)
The retrieved signal in Eq. (17) encapsulates variation and correlation information in the form of full covariance matrices. GMR has so far mostly been used in three manners:
-
1.
as an autonomous system with \({{\varvec{\xi }}=[{\varvec{x}}^{\scriptscriptstyle \top },\dot{{\varvec{x}}}^{\scriptscriptstyle \top }]}^{\scriptscriptstyle \top }\), by learning \(\mathcal {P}({\varvec{x}},\dot{{\varvec{x}}})\) with a GMM, with \({\varvec{x}}\) and \(\dot{{\varvec{x}}}\) representing position and velocity of the system (either in task space or joint space), and by retrieving iteratively during reproduction a series of velocity commands by estimating \(\mathcal {P}(\dot{{\varvec{x}}}|{\varvec{x}})\) with GMR [17, 38, 39, 47];
-
2.
as time-indexed trajectories with \({{\varvec{\xi }}=[t,{\varvec{x}}^{\scriptscriptstyle \top }]}^{\scriptscriptstyle \top }\), by learning \(\mathcal {P}(t,{\varvec{x}})\) with a GMM and retrieving \(\mathcal {P}({\varvec{x}}|t)\) with GMR for each time step to reproduce smooth trajectories (infinitely differentiable) [18].
-
3.
as a probabilistic formulation of dynamic movement primitives (DMP) [20].
Alternatively, any subset of input–output dimensions can be selected, which can change, if required, at each iteration during reproduction. It can, for example, handle different sources of missing data, as the system is able to consider any combination of multidimensional input/output mappings during the retrieval phase. Expectations on the remaining dimensions can be computed within the control loop of the robot, corresponding to a convex sum of linear approximations (with weights varying non-linearly).
Figure 7 depicts the use of GMR in the case of time-indexed trajectories.
GMR can be viewed as a trade-off between a global and local approach in the sense that the placement and spread of the basis functions are learned, together with their responses, as a soft partitioning problem through expectation-maximization (EM),Footnote 1 while the prediction is a weighted superposition of locally linear systems. It provides variation and correlation information for the retrieved multidimensional output, enabling the extraction of local coordination patterns in the movement.
Note that if the application requires the encoding of high-dimension data from few observations, subspace learning techniques such as MFA (see Sect. 4) can be used jointly with GMR to locally reduce the dimensionality without modifying the regression process.
The combination of TP-GMM and GMR is simply achieved by augmenting the dataset in each frame with an input dimension and defining all task parameters \({\varvec{A}}_{t,j}\) and \({\varvec{b}}_{t,j}\) so that the input is not modulated by the task parameterization. Compared to an initial TP-GMM encoding \({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}\) with task parameters \({\varvec{A}}^{\scriptscriptstyle {{\mathcal {O}}}}_{t,j}\) and \({\varvec{b}}^{\scriptscriptstyle {{\mathcal {O}}}}_{t,j}\), the combination of TP-GMM and GMR instead encodes \({\varvec{\xi }}=\big [{{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}}^{\scriptscriptstyle \top }, {{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}}^{\scriptscriptstyle \top }\big ]^{\scriptscriptstyle \top }\) with task parameters
where in the case of a decay term (or an explicit time variable driving the system), the identity matrix \({\varvec{I}}\) collapses to 1.Footnote 2
5.2 GMM with dynamic features (trajectory-GMM)
In the field of speech processing, the extraction of statistics from both static and dynamic features within a hidden Markov model (HMM) has a long history [30, 95, 106]. In particular, it can be used in speech synthesis to avoid discontinuities in the generated speech spectra. The synthesized speech then becomes natural and smooth even when a small number of Gaussians is used. This is achieved by coordinating the distributions of both static and dynamic features (the dynamic features are often called delta and delta-delta parameters). In speech processing, these parameters usually correspond to the evolution of mel-frequency cepstral coefficients characterizing the power spectrum of a sound, but the same trajectory-HMM approach can be used with any form of continuous signals. In robotics, this approach has rarely been exploited, at the exception of the work from Sugiura et al. employing it to represent object manipulation movements [89].
For the encoding of movements, velocity and acceleration can alternatively be used as dynamic features. By considering an Euler approximation, the velocity is computed as
where \({\varvec{x}}_t\) is a multivariate position vector. The acceleration is similarly computed as
By using (21) and (22), a vector \({\varvec{\zeta }}_t\) will be used to represent the concatenated position, velocity and acceleration vectors at time step t, namelyFootnote 3
\({\varvec{\zeta }}\) and \({\varvec{x}}\) are then defined as large vectors concatenating \({\varvec{\zeta }}_t\) and \({\varvec{x}}_t\) for all time steps, namely
Similarly to the matrix operator (23) defined for a single time step, a large sparse matrix \({\varvec{{\varPhi }}}\) can be defined so that \({\varvec{\zeta }}={\varvec{{\varPhi }}}{\varvec{x}}\), namelyFootnote 4
The dataset \(\{{\varvec{\zeta }}_t\}_{t=1}^N\) with \(N=\sum _m^M T_m\) is composed of M trajectory samples, where the mth trajectory sample has \(T_m\) datapoints. It can be encoded in a Gaussian mixture model (GMM), hidden Markov model (HMM) or hidden semi-Markov model (HSMM) [73]. The example of a GMM encoding \(\mathcal {P}({\varvec{\zeta }})\sim \sum _{i=1}^K\pi _i\;\mathcal {N}({\varvec{\mu }}_i,{\varvec{{\varSigma }}}_i)\) will be described here, which will later be extended to its task-parameterized version.
After training, for a given sequence of states \({\varvec{s}}=\{s_1,s_2,\ldots ,s_T\}\) of T time steps, with discrete states \(s_t\in \{1,\ldots ,K\}\),Footnote 5 the likelihood of a movement \({\varvec{\zeta }}\) is given by
where \({\varvec{\mu }}_{s_t}\) and \({\varvec{{\varSigma }}}_{s_t}\) are the center and covariance of state \(s_t\) at time step t. This product can be rewritten as the conditional distribution
By using the relation \({\varvec{\zeta }}={\varvec{{\varPhi }}}{\varvec{x}}\), we then seek during reproduction for a trajectory \({\varvec{x}}\) maximizing (27), namely
The part of \(\log \;\mathcal {P}({\varvec{{\varPhi }}}{\varvec{x}} \;|\;{\varvec{s}})\) dependent on \({\varvec{x}}\) takes the quadratic error form
A solution can be found by differentiating the above objective function with respect to \({\varvec{x}}\) and equating to 0, providing the trajectory (in vector form)
with the covariance error of the weighted least squares estimate given by
where \(\sigma \) is a scale factor.Footnote 6
The resulting Gaussian \(\mathcal {N}({\varvec{\hat{x}}},{\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}})\) forms a trajectory distribution. Other forms of trajectory distributions can be employed, where the main differences lie in the structure given to \({\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}}\) and in the way the basis functions are defined. A relevant example is the probabilistic movement primitives approach proposed by Paraschos et al. [72]. The structure of the trajectory distribution defined in [72] requires multiple trajectory demonstrations to avoid overfitting, but the problem can be circumvented by employing factorization and variational inference techniques [77].
In trajectory-GMM, the problem of setting the shape and spread of the basis functions, as well as the problem of determining the sparse structure of \({\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}}\), is directly framed within the GMM likelihood maximization problem, allowing the use of an EM algorithm to automatically organize the basis functions and find an appropriate structure for \({\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}}\), which will, for example, result in a sparse band-diagonal structure when the components are sufficiently decoupled, and which will account for the correlations within \({\varvec{\zeta }}_t\).
An illustration of the trajectory-GMM properties that are of interest in robotics are shown in Fig. 8.
The combination of TP-GMM and trajectory-GMM is achieved by augmenting the position data with its derivatives (e.g., with velocity and acceleration) and defining all task parameters \({\varvec{A}}_{t,j}\) and \({\varvec{b}}_{t,j}\) so that they also apply to the derivatives. Compared to an initial TP-GMM encoding \({\varvec{x}}\) with task parameters \({\varvec{A}}^{\scriptscriptstyle {{\mathcal {O}}}}_{t,j}\) and \({\varvec{b}}^{\scriptscriptstyle {{\mathcal {O}}}}_{t,j}\), the combination of TP-GMM and trajectory-GMM instead encodes \({\varvec{\zeta }}=\big [{\varvec{x}}^{\scriptscriptstyle \top },{\varvec{\dot{x}}}^{\scriptscriptstyle \top },{\varvec{\ddot{x}}}^{\scriptscriptstyle \top }\big ]^{\scriptscriptstyle \top }\) with task parameters
5.3 Dynamic-based versus time-based features in GMM
GMR with time-indexed trajectories (Sect. 5.1) offers a simple solution for the generation of trajectories from a GMM, with the disadvantage that time-based GMR often requires in practice a preprocessing step such as dynamic time warping (DTW) to re-align multiple demonstrations in time.
For the same reason, time-based GMR may also reduce the applicability of motion synthesis to more complex forms of teaching interactions, such as learning recurring patterns (combination of discrete and periodic motions) or demonstrating different options in a movement. This is not the case for trajectory-GMM that can handle these two issues without further modification of the model (see Sect. 5.2). This is achieved at the expense of increasing the GMM dimensionality (by encoding position, velocity and acceleration instead of position and time as in time-based GMR).
Another advantage of encoding dynamic features over time features is that partial demonstrations can easily be used in the training set, see Fig. 8. In service robotics applications, it can sometimes be difficult and inefficient to demonstrate a complex manipulation task in a single shot. A more user-friendly way would be to provide incremental corrections or piecewise demonstrations of whole body movements, where the models can be trained with partial chunks of the whole movement, see, e.g., [55]. This is also required in kinesthetic teaching with robots endowed with a high number of articulations (since the user cannot control all degrees of freedom at the same time with two hands). Trajectory-GMM provides here a way to handle partial demonstrations without further modification of the model.
6 Extension to minimal intervention control
Previous sections discussed the problem of generating a reference trajectory that can adapt to the current situation, by assuming that a controller is available to track the retrieved reference trajectory. In this section, the problem is extended to that of directly finding a controller to reproduce the movement.
Section 2.2 showed that the objective function (3) underlying TP-GMM aims at finding points \({\varvec{\xi }}_t\) minimizing a weighted sum of quadratic error terms, whose result corresponds to a product of Gaussians. A similar function can be defined for the search of a controller, whose objective is to find a feedforward and feedback policy (instead of finding a reference trajectory).
This section depicts the canonical problem of searching a controller \({\varvec{u}}_t\) for the discrete linear dynamical system (double integrator)
minimizing the cost
with \({\varvec{\mu }}_{{\varvec{s}}}\in \mathbb {R}^{TCD}\) and \({\varvec{{\varSigma }}}_{{\varvec{s}}}={{\mathrm {blockdiag}}}({\varvec{{\varSigma }}}_{s_1},{\varvec{{\varSigma }}}_{s_2},\ldots ,{\varvec{{\varSigma }}}_{s_T})\) with \({\varvec{{\varSigma }}}_{{\varvec{s}}}\in \mathbb {R}^{TCD\times TCD}\) defined as in Eq. (27), and \({\varvec{\tilde{R}}}={{\mathrm {blockdiag}}}({\varvec{R}},{\varvec{R}},\ldots ,{\varvec{R}})\) with \({\varvec{\tilde{R}}}\in \mathbb {R}^{(T-1)D\times (T-1)D}\) an additional cost on the control inputs. The problem corresponds to an unconstrained linear model predictive control (MPC) problem.
It is worth noting that the objective function (29) used in the context of GMM with dynamic features (see Sect. 5.2) is a special case of (34) with \({\varvec{\tilde{R}}}={\varvec{0}}\).
We showed in [16] that TP-GMM can be used to find a controller autonomously regulating the stiffness and damping behavior of the robot, see also Fig. 2d. The model shares links with optimal feedback control strategies in which deviations from an average trajectory are corrected only when they interfere with task performance, resulting in a controller satisfying minimal intervention principle [94, 103]. The approach also shares similarities with the solution proposed by Medina et al. in the context of risk-sensitive control for haptic assistance [63], by exploiting the predicted variability to form a minimal intervention controller (in task space or in joint space). The retrieved variability and correlation information is exploited to generate safe and natural movements within an optimal control strategy, in accordance with the predicted range of motion that could correctly reproduce the task in the current situation. Indeed, we demonstrated in [16] that TP-GMM is fully compatible with linear quadratic regulation (LQR) strategies, providing a controller adapted to the current situation with both impedance gains and reference trajectories varying with respect to external task parameters.
The tracking problem can be solved by different techniques, either exploiting tools from physics, dynamic programming or linear algebra [6, 10]. It can, for example, be solved with a batch approach by expressing all future states \({\varvec{x}}_t\) as explicit function of the state \({\varvec{x}}_1\). By writing
in a matrix form, we get
with \({\varvec{\zeta }}\in \mathbb {R}^{TCD}\), \({\varvec{S}}^{{\varvec{\xi }}}\in \mathbb {R}^{TCD\times CD}\), \({\varvec{\xi }}_1\in \mathbb {R}^{CD}\), \({\varvec{S}}^{{\varvec{u}}}\in \mathbb {R}^{TCD\times (T-1)D}\) and \({\varvec{U}}\in \mathbb {R}^{(T-1)D}\).
Substituting (35) into (34), we get the cost function
Differentiating with respect to \({\varvec{U}}\) and equating to zero yields the sequence of control inputs
corresponding to a damped weighted least squares estimate (ridge regression).
The sequence of acceleration commands (37) can either be used as a planning technique to reconstruct a trajectory with (35) or as a control technique to estimate feedforward and feedback terms for the current iteration.
The same controller can also be found iteratively with a Riccati equation, see [16] for details.
It is worth noting that the constraint (33) defines the same set of relations as (21) and (22) used in trajectory-GMM (GMM with dynamic features). The main difference between the two problems is that trajectory-GMM seeks for a reference trajectory, while the above problem seeks for a controller. Both problems result in a weighted least squares solution, with the difference that (37) uses a Tikhonov regularization term corresponding to the cost \({\varvec{R}}\) that we set on the control inputs.
Extension to multiple coordinate systems
The above optimal control problem can be extended to reference trajectories expressed in P coordinate systems. By extending the cost in (34) to
the sequence of control inputs becomes
whose intermediary variables \({\varvec{\mu }}_{{\varvec{s}}}\) and \({\varvec{{\varSigma }}}_{{\varvec{s}}}\) corresponds to the Gaussian resulting from the product of Gaussians with centers \({\varvec{\mu }}_{{\varvec{s}}}^{(j)}\) and covariances \({\varvec{{\varSigma }}}_{{\varvec{s}}}^{(j)}\). Namely, solving
is equivalent to the two step optimization process
showing that the combination of TP-GMM with LQR corresponds to an optimal controller acting in multiple coordinate systems.
The problem can be viewed as a form of inverse optimal control (IOC) [1, 2, 24], or more precisely, as a rudimentary form of IOC which can be solved analytically. Namely, it can provide a controller without exploratory search, at the expense of being restricted to simple forms of objectives (weighted sums of quadratic errors whose weights are learned from the demonstrations). This dual view can be exploited in further research to bridge action-level and goal-level imitation or to provide better initial estimates in IOC problems.
Figure 9 shows that a TP-GMM with a single Gaussian, combined with a minimal intervention controller based on LQR, can be used to encode and retrieve various behaviors.
7 Extension to task parameters in the form of projection constraints
Thus far, this tutorial considered problems in which the task parameters were related to position, orientation or shape of objects in a Cartesian space. However, the use of TP-GMM is not limited can be extended to other forms of locally linear transformations or projections. The consideration of non square \({\varvec{A}}_{t,j}\) matrices is, for example, relevant for the consideration of soft constraints in both configuration and operational spaces (through Jacobian operators), see [15] for a preliminary work in this direction.
It can also provide a principled way to learn nullspace constraints in a probabilistic form. The different frames correspond in this case to various candidate subspace projections of the movement, with statistical information extracted from the different projections.
An important and challenging category of applications includes the problems requiring priority constraints [37, 58, 78, 96, 104]. Such constraints can be learned and encoded within a TP-GMM from an initial set of task hierarchies given as potential candidates to describe the observed skill. The probabilistic encoding is exploited here to discover in which manner the subtasks are prioritized.
For a controller handling constraints both in configuration and operational spaces, some of the most common candidate projection operators are presented in Table 3, covering a very wide range of robotics applications.
Note here that the Gaussian product is computed in configuration space (\({\varvec{q}}\) and \({\varvec{x}}\) represent, respectively, poses in joint space and task space). Equation (39) describes joint space constraints in a fixed frame. It corresponds to the canonical frame defined by \({\varvec{A}}_{t,j}={\varvec{I}}\) (identity matrix) and \({\varvec{b}}_{t,j}={\varvec{0}}\). Equation (40) describes absolute position constraints (in operational space), where \({\varvec{J}}^{\dagger }\) is the Jacobian pseudoinverse used as least-norm inverse kinematics solution. Note that Eq. (40) describes a moving frame, where the task parameters change at each iteration (observation of a changing pose in configuration space). Equation (41) describes relative position constraints, where the constraint in task space is related to an object described at each time step t by a position \({\varvec{b}}^{\scriptscriptstyle {{\mathcal {O}}}}_t\) and an orientation matrix \({\varvec{A}}^{\scriptscriptstyle {{\mathcal {O}}}}_t\) in task space. Equation (42) describes nullspace/priority constraints in joint space, with \({\varvec{N}}={\varvec{I}}-{\varvec{J}}^{\dagger }{\varvec{J}}\) a nullspace projection operator. Equation (43) describes absolute position nullspace/priority constraints, where the secondary objective is described in task space (for a point in the kinematic chain with corresponding Jacobian \({\varvec{\tilde{J}}}\)). Finally, Eq. (44) describes relative position nullspace/priority constraints.
The above equations can be retrieved without much effort by discretizing (with an Euler approximation) the standard inverse kinematics and nullspace control relations that can be found in most robotics textbooks, see, e.g., [5].
Figure 10 shows an example of constraints in configuration and operational spaces. The task requires the robot to consider, in parallel and in series, constraints in task space (pointing to objects) and in joint space (maintaining a preferred posture). The frame in joint space is defined by Eq. (42), and the frames in task space are defined by Eq. (41) for the two objects and by Eq. (40) for the motion in the robot frame, with \({\varvec{x}}\) encoding Euler angles and \({\varvec{J}}\) describing the orientation part of the end-effector Jacobian.
Figure 11 presents a TP-GMM example with task parameters taking the form of nullspace bases. The frames are defined by Eqs. (41) and (44) with two different combinations of nullspaces and Jacobians corresponding to the left and right arm.
8 Comparisons with other task-adaptive approaches
This section aims at comparing different techniques to adapt movements to new situations, using the task adaptation problem depicted in Figs. 1 and 2.
Qualitative comparisons for the task-parameterized approaches presented throughout the article are presented in Figs. 12 and 13.
Figure 12 shows that GMM/GMR MFA/GMR and trajectory-GMM could all retrieve suitable trajectories for each of the new situations.
Figure 13 shows that the proposed task-parameterized approach can also be employed with data-driven encoding strategies, at the expense of a slower reproduction process that depends on the number of demonstrations. For this simple dataset, a kernel-based approach (first row) generated new trajectories that are similar to the trajectories generated by the weighted least-squares approach (second row).
The same task adaptation problem was also tested with two baseline approaches that are described next.
8.1 Gaussian process regression (GPR) with trajectory models
An alternative way of handling task-parameterized movements is to fit a model to each demonstration and associate it with a task-specific feature, goal, style variable or perceptual feedback, see, for example, [21, 29, 44, 48, 51, 60, 97].
Such approach is typically better suited for task parameters that do not change during the demonstration. It can be achieved in a non-parametric or parametric way, by either encoding the relationships between the raw trajectory variables and the task parameters (first row in Fig. 14) or by encoding the relationships between the trajectory model parameters and the task parameters (second row in Fig. 14, see also Fig. 1).
In this second approach, the output variables \({\varvec{{\varTheta }}}\) are the model parameters and the query points \({\varvec{Q}}\) are the task parameters. The reproduction step consists of retrieving new model parameters \({\varvec{{\varTheta }}}^d\) from new task parameters \({\varvec{Q}}^d\), which can, for example, be achieved with Gaussian process regression (GPR) [75]. After centering the training data, the joint distribution of the demonstrated and new outputs can be estimated as
where \({\varvec{Q}}\) is the concatenation of query points \({\varvec{q}}_m\), and \({\varvec{{\varTheta }}}\) the concatenation of outputs \({\varvec{\theta }}_m\), with \(m\in \{1,\ldots ,M\}\) (M is the number of demonstrations). Squared exponential covariance functions \({\varvec{K}}\) are considered here.
By using the conditional probability property of normal distributions, the expected outputs \({\varvec{\hat{{\varTheta }}}}\) associated with the new query points \({\varvec{Q}}^d\) are given by
with the covariance of the prediction given by
The above formulation can be used with various trajectory models. For a fair comparison, GMM encoding of trajectories was considered, with GMR used to regenerate the trajectories. Thus, a GMM \({\varvec{\theta }}_m=\{\pi _{i,m},{\varvec{\mu }}_{i,m},{\varvec{{\varSigma }}}_{i,m}\}_{i=1}^K\) is fit to each demonstration \({\varvec{\xi }}_m\), with associate query point \({\varvec{q}}_m = \{{\varvec{A}}_{m,j}, {\varvec{b}}_{m,j}\}_{j=1}^P\) describing the demonstration context. After all demonstrations are collected, \([ {\varvec{K}}({\varvec{Q}},{\varvec{Q}}) + \sigma ^2 {\varvec{I}} ]^{-1} {\varvec{{\varTheta }}}\) in Eqs. (45), (46) can be precomputed.
During reproduction, a new query point \({\varvec{Q}}^d\) with \({\varvec{Q}}^d = \{{\varvec{A}}_{j}, {\varvec{b}}_{j}\}_{j=1}^P\) is used to retrieve a new GMM using Eq. (46), which is then used to retrieve a new trajectory through GMR.
The first and second row of Fig. 14 show generalization results for the use of GPR with a non-parametric (first row) and parametric (second row) encoding of the data. Although GPR can easily handle situations within the range of the demonstrations, its generalization capability can degrade if the query points are too far from the demonstrations (it collapses to an average of the models), which is confirmed by the results of the experiment.
8.2 Parametric HMM/GMM (PHMM/PGMM)
Another approach to handle task-parameterized movements is to encode all demonstrations in a single mixture model, where each cluster in the mixture learns the relations between the task parameters and the motion. The parametric hidden Markov model (PHMM) is a representative approach in this category. It was originally introduced for recognition and prediction of gestures [102] and extended in robotics to movement generation [52, 105]. We will refer to a parametric Gaussian mixture model (PGMM) when the transition and initial state probabilities are not taken into account in the likelihood estimation.Footnote 7
The original model modulates in each cluster the center of a Gaussian distribution through an affine relationship with the task parameters. This is achieved by concatenating the task parameters in a vector \({\varvec{Q}}_t\) as in the above, and defining the centers of the Gaussians in a task-adaptive manner with
where \({\varvec{\tilde{Z}}}_i\) describe the model parameters to be estimated. In the case of affine relationships, this can be done with EM, see “Appendix 4” for details. The other parameters of the model are estimated as the standard GMM formulation.
After having trained the model, each new set of task parameters concatenated in \({\varvec{Q}}_t\) will provide new Gaussian centers \({\varvec{\mu }}_{t,i}\) in the GMM using Eq. (48), where GMR can then be used to retrieve a new trajectory.
The last row of Fig. 14 shows generalization results with PGMM. The main drawback of this model is that only the centers of the Gaussians are adapted to the current situation. The covariances are estimated as constant matrices \({\varvec{{\varSigma }}}_i\), estimated with the standard EM procedure for GMM. This limitation is confirmed in the experiment, where EM often converged to local optima that were unable to extract the underlying structures of the task for the encoding of continuous movements.
9 Discussion and future work
Recent service robots are provided with numerous and/or high-resolution sensors and actuators. This increase of dimensionality is problematic in applications where the sample size remains bounded by the cost of data acquisition. There is a non-negligible number of applications that would require models capable of targeting wide-ranging data. Namely, models that could start learning from a small number of demonstrations, while still being able to continue learning once more data become available. Robot learning from demonstration is one such field, whose learning challenge often requires the design of appropriate domain-specific priors to ensure that generalization can be achieved from small training sets.
We showed throughout this article that an efficient and versatile prior knowledge for task adaptation is to consider that the task parameters describing the current situation (body and workspace configuration that the robot encounters) can be represented as affine transformations (including frames of reference, coordinate systems or projections).
The above prior requires the experimenter to provide the robot with a set of candidate frames that could potentially be relevant for the task. We showed that the representation as affine transformations has a simple interpretation, can be easily implemented and remains valid for a wide range of skills that a service robot can encounter. It was shown in [4] that when frames are missing during reproduction (e.g., when occlusions occur or when frames are collected at different rates), the system is still able to reproduce an appropriate behavior for the current circumstance.
A limitation of the current TP-GMM approach is that it requires the experimenter to provide an initial set of frames that will act as candidate projections/transformations of the data that might potentially be relevant for the task. The number of frames can be overspecified by the experimenter (e.g., by providing an exhaustive list), but it comes at the expense of requiring more demonstrations to obtain sufficient statistics to discard the frames that have no role in the task. The problem is not different from the problem of selecting the variables that will form the feature vectors fed to a learning process. The only difference lies in the selection of frames in the form of affine transformations that are most often associated with easily interpretable behaviors.
In practice, the experimenter selects objects and locations in the robot kinematic chain that might be relevant for the task, which are typically the end-effectors of the robot, where tools, grippers or parts in contact with the environment are mounted, see also discussion in [45].Footnote 8 The issue of predefining an initial set of frames is not restrictive when the number of frames remains reasonably low (e.g., when they come from a set of predefined objects tracked with visual markers in a lab setting). However, for perception in unconstrained environment, the number of frames could grow quickly (e.g., detection of phantom objects), while the number of demonstrations remains low. Further work is thus required to detect redundant frames or remove irrelevant frames, as well as to automatically determine in which manner the frames are coordinated with each other and locally contribute to the achievement of the task. A promising route for further investigation is to exploit the recent developments in multilinear algebra and tensor analysis [49, 85] that exploit the multivariate structure of the data for statistical analysis and compression, without transforming it to a matrix form (by processing data jointly in spatial and spectral ways, instead of flattening the higher-order tensor dataset).
In service robotics, movements often need to be expressed simultaneously in multiple coordinate systems and are stored as multidimensional arrays (tensor-variate data). Multilinear algebra could thus provide a principled method to simultaneously extract eigenframes, eigenposes and eigentrajectories. Multiway analysis of tensor-variate data offers a rich set of data decomposition techniques, whose advantage has been demonstrated in computer imaging fields such as face processing [98], video analysis [107], geoscience [76] or neuroimaging [8], but which remains an uncharted territory in robotics and motor skills learning.
Another open route for further investigation concerns the use of a richer set of task parameters. A wide range of motor skills could potentially be adapted to this framework, by exploiting the functional nature of task parameters to build models that learn the local structure of the task from a low number of demonstrations. Indeed, most task parameterization in robot control can be related to some form of frames of reference, coordinate systems, projections or basis functions, where the involvement of the frames can change during the execution of the task, with transformations represented as locally linear projection operators (e.g., Jacobians for inverse kinematics, kernel matrix for nullspace projections, etc.). A wider range of robot skills could be defined in such way, see, e.g., the possible tasks described in §6.2.1 of [5].
The potential applications are diverse, with an objective in line with the original purpose of motor primitives to be composed together serially or in parallel [28]. TP-GMM could potentially be employed as a tool to revisit existing robotics techniques in a probabilistic form. This includes the consideration of soft constraints in both configuration and operational spaces, where the frames would correspond to different subspace projections of the same movement, with the extracted regularities employed to learn bimanual tasks or whole-body movements.
One of the important next challenges in robot learning will be to extend the concept of movement primitives to a broader range of behaviors including impedance, reaction and collaboration primitives.
10 Conclusion
In service robotics, movements often need to be modulated by external parameters describing the current situation (body and workspace configuration that the robot encounters). This tutorial showed that in many cases, the task parameters can be represented as affine transformations. Based on this prior assumption, a task-parameterized model was presented by exploiting this structure to learn a skill from a small number of demonstrations.
The proposed approach was implemented and tested with various statistical encoding strategies, including standard mixture models, kernel approaches and subspace clustering methods. It was shown that a wide variety of problems in robotics can be reinterpreted by introducing such relation between the task parameters and the model parameters. The approach was demonstrated in a series of control and planning problems in operational and configuration spaces. Each section of the article was accompanied by source codes to help the practitioners study, test and extend the proposed approach.
Notes
Competition/collaboration arises due to the weighting term \(h_{t,i}\) in Eq. (49) summing over the influence of the other Gaussian components.
Possible extensions are possible here for a local modulation of movement duration.
To simplify the notation, the number of derivatives will be set up to acceleration (\(C=3\)), but the results can easy be generalized to a higher or lower number of derivatives (in the provided source codes, a parameter automatically sets the number of derivatives to be considered).
Note that a similar operator is defined to handle border conditions and that \({\varvec{{\varPhi }}}\) can automatically be constructed through the use of Kronecker products, see source codes for details.
The use of an HSMM encoding can autonomously regenerate such sequence in a stochastic manner, which is not described here due to space constraints.
Equations (30) and (31) describe a trajectory distribution and can be computed efficiently with Cholesky and/or QR decompositions by exploiting the positive definite symmetric band structure of the matrices, see for example [87]. With the Cholesky decomposition \({({\varvec{{\varSigma }}}^{{\varvec{s}}})}^{-1}={\varvec{T}}^{\scriptscriptstyle \top }{\varvec{T}}\), the objective function is maximized when \({\varvec{T}}{\varvec{{\varPhi }}}{\varvec{x}}={\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\). With a QR decomposition \({\varvec{T}}{\varvec{{\varPhi }}}={\varvec{Q}}{\varvec{R}}\), the equation becomes \({\varvec{Q}}{\varvec{R}}{\varvec{x}}={\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\) with a solution efficiently computed with \({\varvec{x}}={\varvec{R}}^{-1}{\varvec{Q}}^{\scriptscriptstyle \top }{\varvec{T}}{\varvec{\mu }}^{{\varvec{s}}}\). When using Matlab, \({\varvec{\hat{x}}}\) and \({\varvec{\hat{{\varSigma }}}}^{{\varvec{x}}}\) in Eqs. (30) and (31) can, for example, be computed with the lscov function.
Note here that the term parametric in PGMM/PHMM (referring to task parameters) is ambiguous because a standard GMM can also be described as being parametric (referring to model parameters).
Full end-effector poses or decoupled position and orientation can be considered here.
References
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of international conference on machine learning (ICML)
Akgun B, Thomaz A (2015) Simultaneously learning actions and goals from demonstration. Autono Robots 1–17. doi:10.1007/s10514-015-9448-x
Alissandrakis A, Nehaniv CL, Dautenhahn K (2006) Action, state and effect metrics for robot imitation. In: Proceedings of IEEE international symposium on robot and human interactive communication (Ro-Man), pp 232–237. Hatfield, UK
Alizadeh T, Calinon S, Caldwell DG (2014) Learning from demonstrations with partially observable task parameters. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3309–3314. Hong Kong, China
Antonelli G (2014) Underwater robots, 3rd edn. Springer, Berlin
Astrom KJ, Murray RM (2008) Feedback systems: an introduction for scientists and engineers. Princeton University Press, Princeton
Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309
Basser PJ, Pajevic S (2003) A normal distribution for tensor-valued random variables: applications to diffusion tensor MRI. IEEE Trans Med Imaging 22(7):785–794
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Borrelli F, Bemporad A, Morari M (2015) Predictive control for linear and hybrid systems. Cambridge University Press, Cambridge In preparation
Bouveyron C, Brunet C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Bouveyron C, Girard S, Schmid C (2007) High-dimensional data clustering. Comput Stat Data Anal 52(1):502–519
Brand M, Hertzmann A (2000) Style machines. In: Proceedings of ACM international conference on computer graphics and interactive techniques (SIGGRAPH), pp 183–192. New Orleans, Louisiana, USA
Calinon S, Alizadeh T, Caldwell DG (2013) On improving the extrapolation capability of task-parameterized movement models. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 610–616. Tokyo, Japan
Calinon S, Billard AG (2009) Statistical learning by imitation of competing constraints in joint space and task space. Adv Robot 23(15):2059–2076
Calinon S, Bruno D, Caldwell DG (2014) A task-parameterized probabilistic model with minimal intervention control. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3339–3344. Hong Kong, China
Calinon S, D’halluin F, Sauser EL, Caldwell DG, Billard AG (2010) Learning and reproduction of gestures by imitation: an approach based on hidden Markov model and Gaussian mixture regression. IEEE Robot Autom Mag 17(2):44–54
Calinon S, Guenter F, Billard AG (2007) On learning, representing and generalizing a task in a humanoid robot. IEEE Trans Syst Man Cybern B 37(2):286–298
Calinon S, Kormushev P, Caldwell DG (2013) Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning. Robot Auton Sys 61(4):369–379
Calinon S, Li Z, Alizadeh T, Tsagarakis NG, Caldwell DG (2012) Statistical dynamical systems for skills acquisition in humanoids. In: Proceedings of IEEE international conference on humanoid robots (humanoids), pp 323–329. Osaka, Japan
Campbell CL, Peters RA, Bodenheimer RE, Bluethmann WJ, Huber E, Ambrose RO (2006) Superpositioning of behaviors learned through teleoperation. IEEE Trans Robot 22(1): 79–91
Chatzis SP, Korkinof D, Demiris Y (2012) A nonparametric Bayesian approach toward robot learning by demonstration. Robot Auton Syst 60(6):789–802
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38
Doerr A, Ratliff N, Bohg J, Toussaint M, Schaal S (2015) Direct loss minimization inverse optimal control. In: Proceedings of robotics: science and systems (R:SS), pp 1–9. Rome, Italy
Dong S, Williams B (2012) Learning and recognition of hybrid manipulation motions in variable environments using probabilistic flow tubes. Int J Soc Robot 4(4):357–368
Field M, Stirling D, Pan Z, Naghdy F (2015) Learning trajectories for robot programing by demonstration using a coordinated mixture of factor analyzers. IEEE Trans Cybern (in press)
Figueiredo MAT, Jain AK (2002) Unsupervised learning of finite mixture models. IEEE Trans Pattern Anal Mach Intell 24(3):381–396
Flash T, Hochner B (2005) Motor primitives in vertebrates and invertebrates. Curr Opin Neurobiol 15(6):660–666
Forte D, Gams A, Morimoto J, Ude A (2012) On-line motion synthesis and adaptation using a trajectory database. Robot Auton Syst 60(10):1327–1339
Furui S (1986) Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans Acoust Speech Signal Process 34(1):52–59
Gales MJF (1999) Semi-tied covariance matrices for hidden Markov models. IEEE Trans Speech Audio Process 7(3):272–281
Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers. Tech. rep., University of Toronto
Ghahramani Z, Jordan MI (1994) Supervised learning from incomplete data via an EM approach. In: Cowan JD, Tesauro G, Alspector J (eds) Advances in neural information processing systems (NIPS), vol 6. Morgan Kaufmann, San Francisco, pp 120–127
Greggio N, Bernardino A, Dario P, Santos-Victor J (2014) Efficient greedy estimation of mixture models through a binary tree search. Robot Auton Syst 62(10):1440–1452
Grimes DB, Chalodhorn R, Rao RPN (2006) Dynamic imitation in a humanoid robot through nonparametric probabilistic inference. In: Proceedings of robotics: science and systems (R:SS), pp 1–8
Gross R, Shi J (2001) The CMU motion of body (MoBo) database. Tech. Rep. CMU-RI-TR-01-18, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
Hak S, Mansard N, Stasse O, Laumond JP (2012) Reverse control for humanoid robot task recognition. IEEE Trans Syst Man Cybern B Cybern 42(6):1524–1537
Hersch M, Guenter F, Calinon S, Billard AG (2006) Learning dynamical system modulation for constrained reaching tasks. In: Proceedings of IEEE international conference on humanoid eobots (humanoids), pp 444–449. Genova, Italy
Hersch M, Guenter F, Calinon S, Billard AG (2008) Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Trans Robot 24(6):1463–1467
Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11(10):428–434
Hogan N, Sternad D (2012) Dynamic primitives of motor behavior. Biol Cybern 106(11–12):727–739
Hsu D, Kakade SM (2013) Learning mixtures of spherical Gaussians: moment methods and spectral decompositions. In: Conference on innovations in theoretical computer science, pp 11–20
Ijspeert A, Nakanishi J, Pastor P, Hoffmann H, Schaal S (2013) Dynamical movement primitives: learning attractor models for motor behaviors. Neural Comput 25(2):328–373
Inamura T, Toshima I, Tanie H, Nakamura Y (2004) Embodied symbol emergence based on mimesis theory. Int J Robot Res 23(4–5):363–377
Jetchev N, Toussaint M (2014) Discovering relevant task spaces using inverse feedback control. Auton Robot 37(2):169–189
Kelso JAS (2009) Synergies: atoms of brain and behavior. In: Sternad D (ed) A multidisciplinary approach to motor control. Advances in Experimental Medicine and Biology, vol 629. Springer, Heidelberg, pp 83–91
Khansari-Zadeh SM, Billard A (2011) Learning stable non-linear dynamical systems with Gaussian mixture models. IEEE Trans Robot 27(5):943–957
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robot 33(4):361–379
Kolda T, Bader B (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Krishnan S, Garg A, Patil S, Lea C, Hager G, Abbeel P, Goldberg K (2015) Unsupervised surgical task segmentation with milestone learning. In: Proceedings of international symposium on robotics research (ISRR)
Kronander K, Khansari-Zadeh MSM, Billard A (2011) Learning to control planar hitting motions in a minigolf-like task. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 710–717
Krueger V, Herzog DL, Baby S, Ude A, Kragic D (2010) Learning actions from observations: primitive-based modeling and grammar. IEEE Robot Autom Mag 17(2):30–43
Kulis B, Jordan MI (2012) Revisiting k-means: new algorithms via Bayesian nonparametrics. In: Proceedings of international conference on machine learning (ICML)
Latash ML, Scholz JP, Schoener G (2002) Motor control strategies revealed in the structure of motor variability. Exerc Sport Sci Rev 30(1):26–31
Lee D, Ott C (2011) Incremental kinesthetic teaching of motion primitives using the motion refinement tube. Auton Robots 31(2):115–131
Lee SH, Suh IH, Calinon S, Johansson R (2015) Autonomous framework for segmenting robot trajectories of manipulation task. Auton Robots 38(2):107–141
Levine S, Wagener N, Abbeel P (2015) Learning contact-rich manipulation skills with guided policy search. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 156–163
Lober R, Padois V, Sigaud O (2014) Multiple task optimization using dynamical movement primitives for whole-body reactive control. In: Proceedings of IEEE international conference on humanoid robots (humanoids). Madrid, Spain
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of of the 5th Berkeley symposium on mathematical statistics and probability, pp 281–297
Matsubara T, Hyon SH, Morimoto J (2011) Learning parametric dynamic movement primitives from multiple demonstrations. Neural Netw 24(5):493–500
McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3–4):379–388
McNicholas PD, Murphy TB (2008) Parsimonious Gaussian mixture models. Stat Comput 18(3):285–296
Medina JR, Lee D, Hirche S (2012) Risk-sensitive optimal feedback control for haptic assistance. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1025–1031
Miller S, Fritz M, Darrell T, Abbeel P (2011) Parametrized shape models for clothing. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 4861–4868
Moldovan TM, Levine S, Jordan MI, Abbeel P (2015) Optimism-driven exploration for nonlinear systems. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3239–3246. Seattle, WA, USA
Mühlig M, Gienger M, Steil J (2012) Interactive imitation learning of object movement skills. Auton Robots 32(2):97–114
Mussa-Ivaldi FA (1992) From basis functions to basis fields: vector field approximation from sparse data. Biol Cybern 67(6):479–489
Neal RM, Hinton GE (1999) A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan MI (ed) Learning in graphical models. MIT Press, Cambridge, pp 355–368
Ng A, Jordan M, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 849–856
Nguyen-Tuong D, Peters J (2008) Local Gaussian process regression for real-time model-based robot control. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 380–385
Niekum S, Osentoski S, Konidaris G, Chitta S, Marthi B, Barto AG (2015) Learning grounded finite-state representations from unstructured demonstrations. Int J Robot Res 34(2):131–157
Paraschos A, Daniel C, Peters J, Neumann G (2013) Probabilistic movement primitives. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems (NIPS). Curran Associates, Red Hook, pp 2616–2624
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–285
Rasmussen CE (2000) The infinite Gaussian mixture model. In: Solla SA, Leen TK, Mueller K-R (eds) Advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 554–560
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Renard N, Bourennane S, Blanc-Talon J (2008) Denoising and dimensionality reduction using multilinear tools for hyperspectral images. IEEE Geosci Remote Sens Lett 5(2):138–142
Rueckert E, Mundo J, Paraschos A, Peters J, Neumann G (2015) Extracting low-dimensional control variables for movement primitives. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1511–1518. Seattle, WA, USA
Saveriano M, An S, Lee D (2015) Incremental kinesthetic teaching of end-effector and null-space motion primitives. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 3570–3575
Schaal S, Atkeson CG (1998) Constructive incremental learning from only local information. Neural Comput 10(8):2047–2084
Schaal S, Mohajerian P, Ijspeert AJ (2007) Dynamics systems vs. optimal control: a unifying view. Prog Brain Res 165:425–445
Scholz JP, Schoener G (1999) The uncontrolled manifold concept: identifying control variables for a functional task. Exp Brain Res 126(3):289–306
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Scott DW, Szewczyk WF (2001) From kernels to mixtures. Technometrics 43(3):323–335
Shi T, Belkin M, Yu B (2009) Data spectroscopy: eigenspace of convolution operators and clustering. Ann Stat 37(6B):3960–3984
Signoretto M, Van de Plas R, De Moor B, Suykens JAK (2011) Tensor versus matrix completion: a comparison with application to spectral data. IEEE Signal Process Lett 18(7):403–406
Sternad D, Park SW, Mueller H, Hogan N (2010) Coordinate dependence of variability analysis. PLoS Comput Biol 6(4):1–16
Strang G (1986) Introduction to applied mathematics. Wellesley-Cambridge Press, Wellesley
Stulp F, Sigaud O (2015) Many regression algorithms, one unified model—a review. Neural Netw 69:60–79
Sugiura K, Iwahashi N, Kashioka H, Nakamura S (2011) Learning, generation, and recognition of motions by reference-point-dependent probabilistic models. Adv Robot 25(6–7):825–848
Sung HG (2004) Gaussian mixture regression and classification. PhD thesis, Rice University, Houston, Texas
Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In: Proceedings of IEEE international conference on robotics and automation (ICRA), pp 1142–1148
Tang Y, Salakhutdinov R, Hinton G (2012) Deep mixtures of factor analysers. In: Proceedings of international conference on machine learning (ICML). Edinburgh, Scotland
Tipping ME, Bishop CM (1999) Mixtures of probabilistic principal component analyzers. Neural Comput 11(2):443–482
Todorov E, Jordan MI (2002) Optimal feedback control as a theory of motor coordination. Nat Neurosci 5:1226–1235
Tokuda K, Masuko T, Yamada T, Kobayashi T, Imai S (1995) An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features. In: Proceedings of European conference on speech communication and technology (EUROSPEECH), pp 757–760
Towell C, Howard M, Vijayakumar S (2010) Learning nullspace policies. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 241–248
Ude A, Gams A, Asfour T, Morimoto J (2010) Task-specific generalization of discrete and periodic dynamic movement primitives. IEEE Trans Robot 26(5):800–815
Vasilescu MAO, Terzopoulos D (2002) Multilinear analysis of image ensembles: TensorFaces. In: Computer vision (ECCV), Lecture Notes in Computer Science, vol 2350. Springer, Berlin, pp 447–460
Verbeek JJ, Vlassis N, Kroese B (2003) Efficient greedy learning of Gaussian mixture models. Neural Comput 15(2):469–485
Vijayakumar S, D’souza A, Schaal S (2005) Incremental online learning in high dimensions. Neural Comput 17(12):2602–2634
Wang Y, Zhu J (2015) DP-space: Bayesian nonparametric subspace clustering with small-variance asymptotics. In: Proceedings of international conference on machine learning (ICML), pp 1–9. Lille, France
Wilson AD, Bobick AF (1999) Parametric hidden Markov models for gesture recognition. IEEE Trans Pattern Anal Mach Intell 21(9):884–900
Wolpert DM, Diedrichsen J, Flanagan JR (2011) Principles of sensorimotor learning. Nat Rev 12:739–751
Wrede S, Emmerich C, Ricarda R, Nordmann A, Swadzba A, Steil JJ (2013) A user study on kinesthetic teaching of redundant robots in task and configuration space. J Hum Robot Interact 2:56–81
Yamazaki T, Niwase N, Yamagishi J, Kobayashi T (2005) Human walking motion synthesis based on multiple regression hidden semi-Markov model. In: Proceedings of international conference on cyberworlds, pp 445–452
Zen H, Tokuda K, Kitamura T (2007) Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences. Comput Speech Lang 21(1):153–173
Zhao Q, Zhou G, Adali T, Zhang L, Cichocki A (2013) Kernelization of tensor-based models for multiway data analysis: processing of multidimensional structured data. IEEE Signal Process Mag 30(4):137–148
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was in part supported by the DexROV Project through the EC Horizon 2020 programme (Grant #635491).
Appendices
Appendix 1: Expectation-maximization for TP-GMM parameters estimation
In order to estimate the parameters of a TP-GMM, the following two steps are repeated until convergence:
E-step:
M-step:
In practice, it is recommended to start EM from a coarse estimate of the parameters. For example, based on an equal split in time of motion segments, based on a geometric segmentation with k-means [59], based on moments or spectral approaches with circular covariances [42, 53, 84] or based on an iterative clustering algorithm [83].
Model selection (i.e., determining the number of Gaussians in the GMM) is compatible with the techniques employed in standard GMM, such as the use of a Bayesian information criterion [82], Dirichlet process [22, 50, 65, 74], iterative pairwise replacement [83], spectral clustering [53, 69, 84] or based on segmentation points [56]. Model selection in mixture modeling shares a similar core challenge as that of data-driven sparse kernel regression techniques, which requires to find the right bandwidth parameters to select a subset of existing/new datapoints that are the most representatives of the dataset.
Appendix 2: Expectation-maximization for TP-MFA and TP-MPPCA parameters estimation
In TP-MFA, the generative model for the jth frame and ith mixture component assumes that a D-dimension random vector \({\varvec{X}}^{(j)}\) is modeled using a d-dimension vector of latent (unobserved) factors \({\varvec{z}}^{(j)}\)
where \({\varvec{\mu }}^{(j)}_i\in \mathbb {R}^D\) is the mean vector of the ith factor analyzer, \({\varvec{z}}^{(j)}\sim \mathcal {N}({\varvec{0}},{\varvec{I}})\) (the factors are assumed to be distributed according to a zero-mean normal with unit variance), and \({\varvec{\epsilon }}^{(j)}_i\sim \mathcal {N}({\varvec{0}},{\varvec{{\varPsi }}}^{(j)}_i)\) is a centered normal noise with diagonal covariance \({\varvec{{\varPsi }}}^{(j)}_i\).
This diagonality is a key assumption in factor analysis. Namely, the observed variables are independent given the factors, and the goal of TP-MFA is to best model the covariance structure of \({\varvec{X}}^{(j)}\). It follows from this model that the marginal distribution of \({\varvec{X}}^{(j)}\) for the ith component is
and the joint distribution of \({\varvec{X}}^{(j)}\) and \({\varvec{z}}^{(j)}\) is
The above can be used to show that the d factors are informative projections of the data, which can be computed by Gaussian conditioning, corresponding to the affine projection
As highlighted by [32], the same process can be used to estimate the second moment of the factors \(\mathbb {E}\Big ({\varvec{z}}^{(j)}{{\varvec{z}}^{(j)}}^{\scriptscriptstyle \top }|{\varvec{X}}^{(j)}\Big )\), which provides a measure of uncertainty in the factors that has no analogue in PCA. This relation can be exploited to derive an EM algorithm (see for example [32] or [62]) to train a TP-MFA model of K components with parameters \(\big \{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{{\varLambda }}}^{(j)}_i,{\varvec{{\varPsi }}}^{(j)}_i\}_{j=1}^P\big \}_{i=1}^K\), yielding an EM parameters estimation strategy.
The following two steps are repeated until convergence:
E-step:
M-step:
computed with the help of the intermediary variables
Alternatively, an update step simultaneously computing \({\varvec{\mu }}^{(j)}_i\) and \({\varvec{{\varLambda }}}^{(j)}_i\) can be derived, see [32] for details.
Similarly, the M-step in TP-MPPCA is given by
computed with the help of the intermediary variables
where \({\varvec{{\varLambda }}}^{(j)}_i\) is replaced by \({\varvec{\tilde{\varLambda }}}^{(j)}_i\) at each iteration, see [93] for details.
Appendix 3: Gaussian mixture regression approximated by a single normal distribution
Let us consider a datapoint \({\varvec{\xi }}_t\) distributed as in Eq. (6), with \(\mathcal {P}({\varvec{\xi }}_t) = \mathcal {P}({\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {I}}}}_t,{\varvec{\xi }}^{\scriptscriptstyle {{\mathcal {O}}}}_t)\) being the joint distribution describing the data. The conditional probability of an output given an input is
where \(z_i\) represents the ith component of the GMM. Namely,
The conditional mean can be computed as
In order to evaluate the covariance, we calculate
We have that
By using Eq. (72) with a Gaussian distribution, we obtain
Combining (72) with (74) we finally have that (see also [90])
Appendix 4: Expectation-maximization for parametric GMM parameters estimation
The following two steps are repeated until convergence, see [102] for details:
E-step:
M-step:
Rights and permissions
About this article
Cite this article
Calinon, S. A tutorial on task-parameterized movement learning and retrieval. Intel Serv Robotics 9, 1–29 (2016). https://doi.org/10.1007/s11370-015-0187-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11370-015-0187-9