1 Introduction

Robots are provided with an increasing number of sensors and actuators. This trend introduces original challenges in machine learning, where the sample size is often bounded by the cost of data acquisition, thus requiring models capable of handling wide-ranging data. Namely, models that can start learning from a small number of demonstrations, while still being able to continue learning when more data become available.

Robot learning from demonstration is one such field, which aims at providing end-users with intuitive interfaces to transfer new skills to robots. The challenges in robot learning can often be reinterpreted as designing appropriate domain-specific priors that can supply the required generalization capability from small training sets. The position adopted in this paper is that: (1) generative models are well suited for robot learning from demonstration because they can treat recognition, classification, prediction and synthesis within the same framework; and (2) an efficient and versatile prior is to consider that the task parameters describing the current situation (body and workspace configuration encountered by the robot) can be represented as affine transformations (including frames of reference, coordinate systems or projections).

By providing such structure to the skill generation problem, the role of the experimenter is to provide the robot with a set of candidate frames (list of coordinate systems) that could potentially be relevant for the task. This paper will show that structuring the affine transformations in such way has a simple interpretation, that it can be easily implemented, and that it remains valid for a wide range of skills that a robot can experience.

The task-parameterized Gaussian mixture model (TP-GMM) was presented in [8, 10, 11] for the special case of frames of reference representing rotations and translations in Cartesian space. The current paper discusses the potentials of the approach and introduces several routes for further investigation, aiming at applying the proposed technique to a wider range of affine transformations (directly exploiting the considered application domain), including constraints in both configuration and operational spaces, as well as priority constraints. It also shows that the proposed method can be applied to different probabilistic encoding strategies, including subspace clustering approaches enabling the consideration of high dimension feature spaces. Examples are provided in simulations and on a real robot (transfer of manipulation skills to the Baxter bimanual robot). Accompanying source codes are available at http://www.idiap.ch/software/pbdlib/.

2 Adaptive Models of Movements

Task-parameterized models of movements/behaviors refer to representations that can adapt to a set of task parameters describing for example the current context, situation, state of the environment or state of the robot configuration. The task parameters can for example refer to the variables collected by the system to describe the position of objects in the environment. The task parameters can be fixed during an execution trial or they can vary while the motion is executed. The model parameters refer to the variables learned by the system, namely, that are stored in memory (the internal representation of the movement). During reproduction, a new set of task parameters (describing the present situation) is used to generate a new movement (e.g., adaptation to new position of objects).

Several denominations have been introduced in the literature to describe these models, such as task-parameterized [11, 40] (the denomination that will be used here), parametric [26, 29, 49] or stylistic [7]. In these models, the encoding of skills usually serve several purposes, including classification, prediction, synthesis and online adaptation. A taxonomy of task-parameterized models is presented in [8], classifying existing methods in three broad categories: (1) Approaches employing M models for the M demonstrations, performed in M different situations, see e.g. [12, 16, 21, 23, 25, 29, 45]; (2) Approaches employing P models for the P frames of reference that are possibly relevant for the task, see e.g. [13, 32]; (3) Approaches employing a single model whose parameters are modulated by task parameters, see e.g. [20, 26, 49].

In the majority of these approaches, the retrieval of movements from the model parameters and the task parameters is viewed as a standard regression problem. This generality might look appealing at first sight, but it also limits the generalization scope of these models. Our work aims at increasing the generalization capability of task-parameterized models by exploiting the functional nature of the task parameters. The approach arose from the observation that the task parameters in robotics applications can most of the time be related to some form of frames of reference, coordinate systems, basis functions or local projections, whose structure can be exploited to speed up learning and provide the robot with remarkable extrapolation capability.

Fig. 1
figure 1

Illustration of the overall approach (see main text for details). a Observation of a task in different situations and generalization to new contexts. Multiple demonstrations provide the opportunity to discern the structure of the task. b Probabilistic encoding of continuous movements in multiple coordinate systems. c Exploitation of variability and correlation information to adapt the motion to new situations. With cross-situational observations of the same task, the robot is able to generalize the skill to new situations. d Computation of the underlying optimal control strategy driving the observed behavior

2.1 Motivation

The core of the approach is to represent an observed movement or behavior as a spring-damper system with varying parameters, where a generative model is used to encode the evolution of the attractor, and the variability and correlation information is used to infer the impedance parameters of the system. These impedance parameters figuratively correspond to the stiffness of a spring and to the damping coefficient of a viscous damper, with the difference that they can also be full stiffness and damping matrices. The model shares links with optimal feedback control strategies in which deviations from an average trajectory are corrected only when they interfere with task performance, resulting in a minimal intervention principle [43].

In its task-parameterized version, several frames of reference are interacting with each other to describe tracking behaviors in multiple coordinate systems, where statistical analysis from the perspective of each of these observers is used to estimate feedforward and feedback control terms with linear quadratic optimal control. Figure 1 presents an illustration of the overall approach, which can be decomposed into multiple steps, involving statistical modeling, dynamical systems and optimal control.

Fig. 2
figure 2

Minimization of the objective function in Eq. (3) composed of a weighted sum of quadratic error terms, whose result corresponds to a product of Gaussians. It is easy to show that \(\mathcal {N}\big ({\varvec{\hat{\xi }}}_{t},{\varvec{\hat{\varSigma }}}_{t}\big )\) corresponds to the Gaussian outcoming from the product of the two Gaussians \(\mathcal {N}\big ({\varvec{\hat{\xi }}}{}_{t}^{(1)},{\varvec{\hat{\varSigma }}}{}_{t}^{(1)}\big )\) and \(\mathcal {N}\big ({\varvec{\hat{\xi }}}{}_{t}^{(2)},{\varvec{\hat{\varSigma }}}{}_{t}^{(2)}\big )\)

2.2 Example with a Single Gaussian

Before presenting the details of the task-parameterized model, the approach is motivated by an introductory example with a single Gaussian.

Two frames will be considered, described respectively at each time step t by \(\{{\varvec{b}}_{t,1},{\varvec{A}}_{t,1}\}\) and \(\{{\varvec{b}}_{t,2},{\varvec{A}}_{t,2}\}\), representing the origin of the observer \({\varvec{b}}_{t,j}\) and a set of basis vectors \(\{{\varvec{e}}_1,{\varvec{e}}_2,\ldots \}\) forming a transformation matrix \({\varvec{A}}_{t,j}\!=\![{\varvec{e}}_{1,t,j},\; {\varvec{e}}_{2,t,j},\; \ldots ]\).

A set of demonstrations is observed from the perspective of the two frames. During reproduction, each frame expects the new datapoints to lie within the same range as that of the demonstrations. If \(\mathcal {N}\big ({\varvec{\mu }}^{(1)},{\varvec{\varSigma }}^{(1)}\big )\) and \(\mathcal {N}\big ({\varvec{\mu }}^{(2)},{\varvec{\varSigma }}^{(2)}\big )\) are the normal distributions of the observed demonstrations in the first and second frames, the two frames respectively expect the reproduction attempt to lie at the intersection of the distributions \(\mathcal {N}\big ({\varvec{\hat{\xi }}}{}_{t}^{(1)},{\varvec{\hat{\varSigma }}}{}_{t}^{(1)}\big )\) and \(\mathcal {N}\big ({\varvec{\hat{\xi }}}{}_{t}^{(2)},{\varvec{\hat{\varSigma }}}{}_{t}^{(2)}\big )\). These distributions can be computed with the linear transformation property of normal distribution as

$$\begin{aligned} {\varvec{\hat{\xi }}}{}^{(1)}_{t} = {\varvec{A}}_{t,1}\; {\varvec{\mu }}^{(1)} + {\varvec{b}}_{t,1}&,\qquad {\varvec{\hat{\varSigma }}}{}^{(1)}_{t} = {\varvec{A}}_{t,1}\; {\varvec{\varSigma }}^{(1)} {\varvec{A}}_{t,1}^{\!\scriptscriptstyle \top }\;,\end{aligned}$$
(1)
$$\begin{aligned} {\varvec{\hat{\xi }}}{}^{(2)}_{t} = {\varvec{A}}_{t,2}\; {\varvec{\mu }}^{(2)} + {\varvec{b}}_{t,2}&,\qquad {\varvec{\hat{\varSigma }}}{}^{(2)}_{t} = {\varvec{A}}_{t,2}\; {\varvec{\varSigma }}^{(2)} {\varvec{A}}_{t,2}^{\!\scriptscriptstyle \top }\;. \end{aligned}$$
(2)

A trade-off thus needs to be determined during reproduction to concord with the distributions expected by each frame. The objective function can be defined as the weighted sum of quadratic error terms

$$\begin{aligned} {\varvec{\hat{\xi }}}_t \;=\; \arg \underset{{\varvec{\xi }}_t}{\min } \sum _{j=1}^2 {\big ({\varvec{\xi }}_t\!-\!{\varvec{\hat{\xi }}}{}_t^{(j)}\big )}^{\!\scriptscriptstyle \top }\; {{\varvec{\hat{\varSigma }}}{}_t^{(j)}}^{-1} \big ({\varvec{\xi }}_t\!-\!{\varvec{\hat{\xi }}}{}_t^{(j)}\big ) . \end{aligned}$$
(3)

The above objective can easily be solved by differentiation, providing a point \({\varvec{\hat{\xi }}}_{t}\), with an error defined by covariance \({\varvec{\hat{\varSigma }}}_{t}\). This estimate corresponds to a product of Gaussians (intersection between the two Gaussians). Figure 2 illustrates this process for one of the Gaussians of Fig. 1.

3 Task-Parameterized Gaussian Mixture Model (TP-GMM)

TP-GMM is a direct extension of the objective problem presented above, by considering multiple frames and multiple clusters of datapoints (soft clustering via mixture modeling). It probabilistically encodes the relevance of candidate frames, which can change throughout the task. In contrast to approaches such as [33] that aim at extracting a single (most prominent) coordinate system located at the end of a motion segment, the proposed approach allows the superposition and transition of different coordinate systems that are relevant for the task (parallel organization of behavior primitives, adaptation to multiple viapoints in the middle of the movement, modulation based on positions, orientations or geometries of objects, etc.).

Each demonstration \(m\!\in \!\{1, \ldots ,M\}\) contains \(T_m\) datapoints forming a dataset of N datapoints \(\{{\varvec{\xi }}_{t}\}_{t=1}^N\) with \(N\!=\!\sum _{m}^{M}\!T_m\). The task parameters are represented by P coordinate systems, defined at time step t by \(\{{\varvec{b}}_{t,j},{\varvec{A}}_{t,j}\}_{j=1}^P\), representing respectively the origin and the basis of the coordinate system.

The demonstrations \({\varvec{\xi }}\!\in \!\mathbb {R}^{D\times N}\) are observed from these different viewpoints, forming P trajectory samples \({\varvec{X}}^{(j)}\!\in \!\mathbb {R}^{D\times N}\). These samples can be collected from sensors located at the frames, or computed with

$$\begin{aligned} {\varvec{X}}^{(j)}_t = {\varvec{A}}_{t,j}^{-1} ({\varvec{\xi }}_t - {\varvec{b}}_{t,j}) . \end{aligned}$$
(4)

The parameters of the proposed task-parameterized GMM (TP-GMM) with K components are defined by \(\{\pi _i,\{{\varvec{\mu }}^{(j)}_i,{\varvec{\varSigma }}^{(j)}_i\}_{j=1}^P\}_{i=1}^K\) (\(\pi _i\) are the mixing coefficients, \({\varvec{\mu }}^{(j)}_i\) and \({\varvec{\varSigma }}^{(j)}_i\) are the center and covariance matrix of the i-th Gaussian component in frame j).

Learning of the parameters is achieved by log-likelihood maximization subject to the constraint that the data in the different frames arose from the same source, resulting in an EM process iteratively updating the model parameters until convergence, see [10] for details. Model selection (i.e., determining the number of Gaussians in the GMM) is compatible with techniques employed in standard mixture models (Bayesian information criterion [37], Dirichlet process [34], small-variance asymptotics [27], etc.). For a movement in Cartesian space with 10 demonstrations and 3 candidate frames, the overall learning process typically takes 1–3 s. The reproduction is much faster and can be computed online (typically below 1 ms).

The learned model is then used to reproduce movements in other situations (for new position and orientation of candidate frames). A new GMM with parameters \(\{\pi _i,{\varvec{\hat{\xi }}}_{t,i},{\varvec{\hat{\varSigma }}}_{t,i}\}_{i=1}^K\) can thus automatically be generated with

$$\begin{aligned} \mathcal {N}\!\Big ( {\varvec{\hat{\xi }}}_{t,i} , {\varvec{\hat{\varSigma }}}_{t,i} \Big ) \;\propto \;&\prod \limits _{j=1}^P \mathcal {N}\!\Big ( {\varvec{\hat{\xi }}}{}_{t,i}^{(j)} ,\; {\varvec{\hat{\varSigma }}}{}_{t,i}^{(j)} \Big ) ,\nonumber \\&\;\mathrm {with}\quad {\varvec{\hat{\xi }}}{}^{(j)}_{t,i} = \!{\varvec{A}}_{t,j} {\varvec{\mu }}^{(j)}_i \!+\! {\varvec{b}}_{t,j} \;,\quad {\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i} = \!{\varvec{A}}_{t,j}{\varvec{\varSigma }}^{(j)}_i {\varvec{A}}_{t,j}^{\!\scriptscriptstyle \top }, \end{aligned}$$
(5)

where the result of the Gaussian product is given by

$$\begin{aligned} {\varvec{\hat{\varSigma }}}_{t,i} = \Big ( \sum \limits _{j=1}^P {{\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i}}^{-1} \Big )^{-1} ,\quad {\varvec{\hat{\xi }}}_{t,i} = {\varvec{\hat{\varSigma }}}_{t,i} \sum \limits _{j=1}^P {{\varvec{\hat{\varSigma }}}{}^{(j)}_{t,i}}^{-1} {\varvec{\hat{\xi }}}{}^{(j)}_{t,i} . \end{aligned}$$
(6)

For computational efficiency, the above equations can be computed with precision matrices instead of covariances.

Several approaches can be used to retrieve movements from the proposed model. An option is to encode both static and dynamic features in the mixture model to retrieve continuous behaviors [22, 39, 51]. An alternative option is to encode time as additional feature in the GMM, and use Gaussian mixture regression (GMR) [18] to retrieve continuous behaviors. Similarly, if the evolution of a decay term is encoded instead of time, the system yields a probabilistic formulation of dynamical movement primitives (DMP) [20], see [11] for details. Figure 3 presents TP-GMR reproduction results for the example in Fig. 1.

Fig. 3
figure 3

Generalization capability of task-parameterized Gaussian mixture model. Each graph shows a different situation with increasing generalization complexity. In each graph, the four demonstrations and the associated adapted model parameters are depicted in semi-transparent colors

4 Extension to Task-Parameterized Subspace Clustering

Classical model-based clustering will tend to perform poorly in high-dimensional spaces. A simple way of handling this issue is to reduce the number of parameters by considering diagonal covariances instead of full matrices, which corresponds to a separated treatment of each variable. Although common in robotics, such decoupling can be a limiting factor to encode movements and sensorimotor streams, because it follows a strategy that is not fully exploiting principles underlying coordination, motor skill acquisition and action-perception couplings.

The rationale is that diagonal structures are unadapted to motor skill representation because they do not encapsulate coordination information among the control variables. The good news is that a wide range of mixture modeling techniques exist between the encoding of diagonal and full covariances. At the exception of [14, 47], these techniques have only been exploited to a limited extent in robot skills acquisition. They can be studied as a subspace clustering problem, aiming to group datapoints such that they can be locally projected in subspaces of reduced dimensionality. Such subspace clustering helps the analysis of the local trend of the movement, while reducing the number of parameters to be estimated, and “locking” the most important coordination patterns to efficiently cope with perturbations.

Several possible constraints can be considered, grouped in families such as parsimonious GMM [6], mixtures of factor analyzers (MFA) [30] or mixtures of probabilistic principal component analyzers [42]. Methods such as MFA provide a simple approach to the problem of high-dimensional cluster analysis with a slight modification of the generative model underlying the mixture of Gaussians to enforce low-dimensional models (i.e., noninvasive regarding the other methods used in the proposed framework). The basic idea of factor analysis (FA) is to reduce the dimensionality of the data while keeping the observed covariance structure. MFA assumes for each component i a covariance structure of the form \({\varvec{\varSigma }}_i\!=\!{\varvec{\varLambda }}_i{\varvec{\varLambda }}_i^{\!\scriptscriptstyle \top }+{\varvec{\varPsi }}_i\), where \({\varvec{{\varvec{\varLambda }}_i}}\!\in \!\mathbb {R}^{D\times d}\), known as the factor loadings matrix, typically has \(d\!<\!D\) (providing a parsimonious representation of the data), and a diagonal noise matrix \({\varvec{\varPsi }}_i\).

Figure 4 shows that the covariance structure in MFA can span a wide range of covariances.

The TP-GMM presented in Sect. 3 is fully compatible with the subspace clustering approaches mentioned above. Bayesian nonparametric approaches such as [48] can be used to simultaneously select the number of clusters and the dimension of the subspace in each cluster.

The TP-MFA extension of TP-GMM opens several roads for further investigation. A possible extension is to use tied structures in the covariances to enable the organization and reuse of previously acquired synergies [17]. Another possible extension is to enable deep or hierarchical learning techniques in task-parameterized models. As discussed in [41], the prior of each FA can be replaced by a separate second-level MFA that learns to model the aggregated posterior of that FA (instead of the isotropic Gaussian), providing a hierarchical structure organization where one layer of latent variables can be learned at a time.

Fig. 4
figure 4

The mixture of factor analyzers (MFA) covers a wide range of covariance structures for the modeling of the data, from diagonal covariances (left) to full covariances (right)

Fig. 5
figure 5

Learning of two behaviors with the Baxter robot. The taught tasks consist of holding a cup horizontally with one hand, and holding a sugar cube above the cup with the other hand, where the two task primitives can be combined in parallel. The demonstrations are provided in two steps by kinesthetic teaching, namely, by holding the arms of the robot and moving them during the task while the robot compensates for the effect of gravity. This procedure allows the user to move the robot arms without feeling their weight and without feeling the motors in the articulations, while the sensors are used to record position information. Here, the data are recorded in several frames of reference (top image). During reproduction, the robot is controlled by following a minimal intervention principle, where the computed feedforward and feedback control commands result in different levels of stiffness obeying the extracted variation and coordination constraints of the task. First sequence: Brief demonstration to show the robot how to hold a cup horizontally. Second sequence: Brief demonstration to show how to hold a sugar cube above the cup. Third sequence: Manual displacement of the left arm to test the learned behavior (the coordination of the two hands was successfully learned). Last sequence: Combination of the two learned task primitives. Here, the user pushes the robot to show that the robot remains soft for perturbations that do not conflict with the acquired task constraints (automatic exploitation of the redundant degrees of freedom that do not conflict with the task)

5 Extension to Minimal Intervention Control

We showed in [10] that TP-GMM can be used to autonomously regulate the stiffness and damping behavior of the robot, see also Fig. 1d. It shares similarities with the solution proposed by Medina et al. in the context of risk-sensitive control for haptic assistance [31], by exploiting the predicted variability to form a minimal intervention controller (in task space or in joint space). The retrieved variability and correlation information is exploited to generate safe and natural movements within an optimal control strategy, in accordance to the predicted range of motion to reproduce the task, evaluated for the current situation. TP-GMM is fully compatible with linear quadratic regulation (LQR) and model predictive control (MPC) [4], providing an approach to learn controllers adapted to the current situation, with feedforward and feedback control commands varying in regard to external task parameters, see [10] for details.

Figure 5 demonstrates that a TP-GMM with a single Gaussian, combined with an infinite-horizon LQR, can readily be used to represent various behaviors that directly exploit the torque control capability of the robot and the redundancy, both at the level of the task and at the level of the robot kinematic structure.

It is worth noting that each frame in the TP-GMM has an associated sub-objective function as in Eq. (3), which aims at minimizing the discrepancy between the demonstrations and the reproduction attempt. By considering the combination of these sub-objectives in the overall objective, the problem can be viewed as a rudimentary form of inverse optimal control (IOC) [1]. This form of IOC does not have external constraints and can be solved analytically, which means that it can provide a controller without exploratory search, at the expense of being restricted to simple forms of objectives (weighted sums of quadratic errors whose weights are learned from the demonstrations). This dual view can be exploited for further research in learning from demonstration, either to bridge action-level and goal-driven imitation, or to initialize the search in IOC.

6 Extension to Multimodal Data and Projection Constraints

TP-GMM is not limited to coordinate systems representing objects in Cartesian space. It can be extended to other forms of locally linear transformations or projections, which opens many roads for further research.

The consideration of non-square \({\varvec{A}}_{t,j}\) matrices is for example relevant to learn and reproduce soft constraints in both configuration and operational spaces (through Jacobian operators). With a preliminary model of task-parameterized movements, we explored in [9] how a similar approach could be used to simultaneously learn constraints in joint space and task space. The model also provides a principled way to learn priority constraints in a probabilistic form (through nullspace operators). The different frames correspond in this case to several subspace projections of the same movement, whose relevance is estimated statistically from the demonstrations.

A wide range of motor skills could potentially be adapted to this framework, by exploiting the functional nature of task parameters to build models that learn the local structure of the task from a small number of demonstrations. Indeed, most task parameterization in robot control can be related to some form of frames of reference, coordinate systems or basis functions, where the involvement of the frames can change during the execution of the task, with transformations represented as local linear projection operators (Jacobians for inverse kinematics, kernel matrices for nullspace projections, etc.).

The potential applications are diverse, with an objective that is well in line with the original purpose of motor primitives to be composed together serially or in parallel [15]. Further work is required to investigate in which manner TP-GMM could be exploited to provide a probabilistic view of robotics techniques that are in practice predefined, handled by ad hoc solutions, or sometimes inefficiently set as hard constraints. This includes the consideration of soft constraints in both configuration and operational spaces. A wide range of robot skills can be defined in such way, see e.g. the possible tasks described in Sect. 6.2.1 of [3]. In humanoids, the candidate frames could for example be employed to learn the constraints of whole-body movements from demonstration or experience, based on the regularities extracted from different subspace projections.

An important category of applications currently attracting a lot of attention concerns the problems requiring priority constraints [19, 28, 36, 44, 50]. With an appropriate definition of the frames and with an initial set of candidate task hierarchies, such constraints can be learned and encoded within a TP-GMM. Here, the probabilistic encoding is exploited to discover, from statistical analysis of the demonstrations, in which manner each subtask is prioritized.

For a controller handling constraints both in configuration and operational spaces, the most common candidate projection operators can be defined as

$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= {\varvec{I}}&{\varvec{\mu }}^{(j)}_i&+ {\varvec{0}} \end{aligned}$$
(7)
$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= {\varvec{J}}^{\!\dagger }\!({\varvec{q}}_{t-1})&{\varvec{\mu }}^{(j)}_i&+ {\varvec{q}}_{t-1} - {\varvec{J}}^{\!\dagger }\!({\varvec{q}}_{t-1}) {\varvec{x}}_{t-1} \end{aligned}$$
(8)
$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= {\varvec{J}}^{\!\dagger }\!({\varvec{q}}_{t-1}) {\varvec{A}}^{\scriptscriptstyle {\mathcal {O}}}_t&{\varvec{\mu }}^{(j)}_i&+ {\varvec{q}}_{t-1} + {\varvec{J}}^{\!\dagger }\!({\varvec{q}}_{t-1}) \big [ {\varvec{b}}^{\scriptscriptstyle {\mathcal {O}}}_t - {\varvec{x}}_{t-1}\big ] \end{aligned}$$
(9)
$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= {\varvec{N}}\!({\varvec{q}}_{t-1})&{\varvec{\mu }}^{(j)}_i&+ {\varvec{J}}^{\!\dagger }\!({\varvec{q}}_{t-1}) {\varvec{J}}\!({\varvec{q}}_{t-1}) {\varvec{q}}_{t-1} \end{aligned}$$
(10)
$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= {\varvec{N}}\!({\varvec{q}}_{t-1}) {\varvec{\tilde{J}}}^{\;{\!\dagger }}\!\!({\varvec{q}}_{t-1})&{\varvec{\mu }}^{(j)}_i&+ {\varvec{q}}_{t-1} - {\varvec{N}}\!({\varvec{q}}_{t-1}) {\varvec{\tilde{J}}}^{\;{\!\dagger }}\!\!({\varvec{q}}_{t-1}) \; {\varvec{x}}_{t-1} \end{aligned}$$
(11)
$$\begin{aligned} {\varvec{\hat{q}}}^{(j)}_{t,i}&= \underbrace{{\varvec{N}}\!({\varvec{q}}_{t-1}) {\varvec{\tilde{J}}}^{\;{\!\dagger }}\!\!({\varvec{q}}_{t-1}) {\varvec{A}}^{\scriptscriptstyle {\mathcal {O}}}_t}_{{\varvec{A}}_{t,j}}&{\varvec{\mu }}^{(j)}_i&+ \underbrace{{\varvec{q}}_{t-1} \!+\! {\varvec{N}}\!({\varvec{q}}_{t-1}) {\varvec{\tilde{J}}}^{\;{\!\dagger }}\!\!({\varvec{q}}_{t-1}) \big [ {\varvec{b}}^{\scriptscriptstyle {\mathcal {O}}}_t \!-\! {\varvec{x}}_{t-1}\big ] }_{{\varvec{b}}_{t,j}} , \end{aligned}$$
(12)

covering a wide range of robotics applications.

Note here that the product of Gaussians is computed in configuration space (\({\varvec{q}}\) and \({\varvec{x}}\) represent respectively poses in joint space and task space). Equation (7) describes joint space constraints in a fixed frame. It corresponds to the canonical frame defined by \({\varvec{A}}_{t,j}\!=\!{\varvec{I}}\) (identity matrix) and \({\varvec{b}}_{t,j}\!=\!{\varvec{0}}\). Equation (8) describes absolute position constraints (in operational space), where \({\varvec{J}}^{\!\dagger }\) is the Jacobian pseudoinverse used as least-norm inverse kinematics solution. Note that Eq. (8) describes a moving frame, where the task parameters change at each iteration (observation of a changing pose in configuration space). Equation (9) describes relative position constraints, where the constraint in task space is related to an object described at each time step t by a position \({\varvec{b}}^{\scriptscriptstyle {\mathcal {O}}}_t\) and an orientation matrix \({\varvec{A}}^{\scriptscriptstyle {\mathcal {O}}}_t\) in task space. Equation (10) describes nullspace/priority constraints in joint space, with \({\varvec{N}}\!=\!{\varvec{I}}\!-\!{\varvec{J}}^{\!\dagger }{\varvec{J}}\) a nullspace projection operator. Equation (11) describes absolute position nullspace/priority constraints, where the secondary objective is described in task space (for a point in the kinematic chain with corresponding Jacobian \({\varvec{\tilde{J}}}\)). Finally, Eq. (12) describes relative position nullspace/priority constraints.

The above equations can be retrieved without much effort by discretizing (with an Euler approximation) the standard inverse kinematics and nullspace control relations that can be found in most robotics textbooks, see e.g. [3].

Figure 6 presents a TP-GMM example with task parameters taking the form of nullspace bases. The frames are defined by Eqs. (9) and (12) with two different combinations of nullspaces \({\varvec{N}}\) and Jacobians \({\varvec{\tilde{J}}}\) corresponding to the left and right arm.

Fig. 6
figure 6

Illustration of the encoding of priority constraints in a TP-GMM. The top row shows 3 demonstrations with a bimanual planar robot with 5 articulations. The color of the robot changes from light gray to black with the movement. The task consists of tracking two objects with the left and right hands (the path of the objects are depicted in red). In some parts of the demonstrations, the two objects could not be reached, and the demonstrator either made a compromise (left graph), or gave priority to the left or right hand (middle and right graphs). The bottom row shows reproduction attempts for new trajectories of the two objects. Although faced with different situations, the priority constraints are reproduced in a similar fashion as in the corresponding demonstrations

7 Discussion and Further Work

A potential limitation of the current TP-GMM approach is that it requires the experimenter to provide an initial set of frames that will act as candidate projections/transformations of the data that can potentially be relevant for the task. The number of frames can be overspecified by the experimenter (e.g., by providing an exhaustive list), at the expense of potentially requiring a large number of demonstrations to obtain sufficient statistics to discard the frames that have no role in the task. The demonstrations must also be sufficiently varied, which becomes more difficult as the number of candidate frames increases. The problem per se is not different from the problem of selecting the variables that will form the feature vector fed to a learning system. The only difference here is that the initial selection of frames takes the form of affine transformations instead of the initial selection of elements in a feature vector.

In practice, the experimenter selects the list of objects or landmarks in the robot workspace, as well as the locations in the robot kinematic chain that might be relevant for the task, which are typically the end-effectors of the robot, where tools, grippers or parts in contact with the environment are mounted. It should be noted here that if some frames of reference are missing during reproduction (e.g., when occlusions occur or when frames are collected at different rates), the system is still able to reproduce an appropriate behavior given the circumstance, see [2] for details.

The issue of predefining an initial set of frames of reference is not restrictive when the number of frames remains reasonably low (e.g., when they come from a set of predefined objects tracked with visual markers in a lab setting). However, for perception in unconstrained environment, the number of frames could potentially grow (e.g., detection of phantom objects), while the number of demonstrations should remain low.

Further work is thus required to detect redundant frames or remove irrelevant frames, as well as to automatically determine in which manner the frames are coordinated with each other and locally contribute to the achievement of the task. A promising route for further investigation is to exploit the recent developments in multilinear algebra and tensor methods [24, 38] that exploit the multivariate structure of data for statistical analysis and compression without transforming it to a matrix form.

In the proposed task-parameterized framework, the movement is expressed simultaneously in multiple coordinate systems, and is stored as a multidimensional array (tensor-variate data). This opens many roads for further investigation, where multilinear algebra could provide a principled method to simultaneously extract eigenframes, eigenposes and eigentrajectories. Multiway analysis of tensor-variate data could imaginably offer a rich set of data decomposition techniques, which has been demonstrated in computer imaging fields such as face processing [46], video analysis [52], geoscience [35] or neuroimaging [5], but which remains underexploited in robotics and motor skills acquisition.

There are several other encoding methods that can be explored within the proposed task-parameterized approach (e.g., with hidden Markov models (HMM), with Gaussian processes (GP) or with other forms of trajectory distributions). Indeed, it is worth noting that the approach is not restricted to mixture models and can be employed with other representations as long as a local measure of uncertainty is available.

8 Conclusion

An efficient prior assumption in robot learning from demonstration is to consider that skills are modulated by external task parameters. These task parameters often take the form of affine transformations, whose role is to describe the current situation encountered by the robot (body and workspace configuration). We showed that this structure can be used with different statistical modeling strategies, including standard mixture models and subspace clustering. The approach can be used in a wide variety of problems in robotics, by reinterpreting them with a structural relation between the task parameters and the model parameters represented as candidate frames of reference. The rationale is that robot skills can often be related to coordinate systems, basis functions or local projections, whose structure can be exploited to speed up learning and provide robots with better generalization capability. Early promises of the approach were discussed in a series of problems in configuration and operational spaces, including tests on a Baxter robot.