Introduction

Achieving cognition relies on robots to provide embodiment — embodiment with rich and complex motor skills that provide means to interact with and manipulate in the physical world [1]. The ability to autonomously manipulate the physical world is the key capability needed to fulfill the potential of cognitive robots. It has an enormous potential for various applications, where autonomous robots can be deployed in all kinds of unstructured and even hazardous environments. Applications can range far beyond today’s utilization of robots in factories; from helping in households, hospitals, and care facilities, to work in radioactive environments or even in space [1, 2].

As one of the key aspects of robotics, manipulation and learning of manipulation has been at the center of research for a long time [3]. The developed applications vary in complexity, venue, and robotic mechanisms [4, 5]. In the long run, research has been progressing towards evaluating approaches and results at the complete system level rather than focusing on the performance of separate components, or even component parts [6].

Manipulation on humanoid robots, which typically offer very rich sensorimotor capabilities [7, 8], represents one of the most complex settings. Physical capabilities of humanoid robots have drastically increased over the last decade [7, 9]. Research in the field of robot manipulation, also applied to humanoid robotics, has brought about rapid advances in the manipulation capabilities. However, general purpose and general manipulation skills on humanoid robots remain open research questions [5].

Several aspects must be considered when realizing an effective manipulation learning on humanoid robots. First of all, due to their similarity to humans, humanoid robots can learn manipulation skills by observing human performance. Learning (or programming) by demonstration has long been an important topic in humanoid robotics research [10, 11]. By observing human performance, a humanoid robot can compute an initial model of the desired manipulation skill. However, since the human and humanoid robot kinematics and dynamics are not exactly the same, such models usually provide only a rough approximation of the desired skill and need to be refined through practicing, which is a form of autonomous robot learning [1, 12]. Autonomous learning, e.g., reinforcement learning, is an essential component to compute more performant manipulation skills for humanoid robots. Other issues include (1) learning from multiple demonstrations of the desired manipulation skill, where the skill is applied under different environmental conditions; (2) learning of bimanual manipulation skills (humanoid robots have two arms in a known kinematic arrangement); and (3) preserving the stability of a humanoid robot while performing manipulation or any other tasks.

The purpose of this paper is to provide an overview of continuous efforts of our research group in these areas. The progressive evolution of research questions and capabilities, evident in the structure of the paper, illustrates the complexity behind the problem of engineering general-purpose robot manipulation skills. Examples of our work on different platforms are shown in Fig. 1.

Fig. 1
figure 1

Examples of our work on different humanoid robotic platforms. a End-to-end vision-to-motion learning [13••]. b Learning of bimanual discrete-periodic manipulations on a humanoid robot (© [2015] IEEE. Reprinted, with permission, from [14]). c Arm synchronization for bimanual motion and obstacle avoidance and d bimanual human-robot collaboration (© [2014] IEEE. Reprinted, with permission, from [15])

Motor Representations and Learning Spaces

An effective movement representation is essential for a successful implementation of robot learning methodologies [16], and even more so for robot manipulation learning. Classical encoding of robotic trajectories encodes motions as a function of time, but representations without explicit time dependency, i.e., autonomous motion representations, are often advantageous when the robot need to react to unexpected events and changes in the environment [17]. Different approaches with and without explicit time dependency have been proposed in the literature. Examples include simple storing of large time-indexed vectors [18], spline fitting and via-points [10], Gaussian Mixture Models [19], function approximators such as different kinds of (deep) neural networks [20,21,22], nonlinear dynamic systems [23,24,25], and others. The reduction of the search space brought about by parametric representations is important for the development of effective robot manipulation learning methods [26].

Due to many favorable properties, nonlinear dynamic systems, which form a class of autonomous motion representations, have been widely applied as motion representation for robots acting in dynamic environments. The favorable properties include easy computation of free parameters to encode specific motions, ease of modulation, inclusion of coupling terms for interaction with the environment or other agents, robustness against perturbation, etc.

One of the most widely used nonlinear dynamic systems for movement representation are the Dynamic Movement Primitives (DMPs) [23]. DMPs describe a control policy by a set of nonlinear differential equations with well-defined attractor dynamics for either point-to-point [27], periodic motions [28, 29], and combined discrete-periodic motions [14, 30]. The approach has been expanded over the years to represent orientation trajectories with quaternions [31, 32], enable speed adaptation [33] and other modulation and adaptation features [15], encode variations of movements by adding probabilistic distributions to DMPs [34], and to arc-length dynamic movement primitives (AL-DMPs) [35], where spatial and temporal aspects of motion are well separated. The latter are especially well-suited for action recognition and learning from multiple demonstrations.

The advent of robots with joint torque sensors led to the development of Compliant Movement Primitives (CMPs) [36], which are used to describe both kinematic and dynamic aspects of motion. To specify a CMP, the teacher first demonstrates the desired motion which is recorded by a DMP. Next the robot executes the recorded DMP with high control gains, which ensures accurate motion tracking. The torques arising during this execution are recorded, encoded with radial basis functions, and used to generate feedforward torques during subsequent execution of the desired motion. By providing the feedforward torques, the robot can remain compliant while ensuring accurate motion tracking.

Representation of Bimanual Motion

Unlike standard robot manipulators, humanoid robots can use two arms in a known kinematic arrangement to carry out the desired manipulation tasks. While dual-arm tasks can be described by independently specifying the motion of the two arms arranged in two separate kinematic chains with the common base, it is often better to separate dual-arm motion in absolute and relative coordinates [37]. This way it becomes possible to directly control the relative motion between the two arms, which is often the key to a successful implementation of dual-arm (bimanual) manipulation skills. Just like Cartesian space manipulator arm movements, movements in relative and absolute frames can be specified with standard Cartesian space DMPs [31]. We exploited the separation of dual-arm humanoid robot motion into relative and absolute coordinates to implement several manipulation skills, e.g., bimanual peg-in-hole [38], bimanual human-robot cooperation for object transportation [39•], and compliant bimanual operations [40].

Humanoid Robot Imitation Learning

Imitation learning, also referred to as learning by demonstration and programming by demonstration (PbD), offers the means to quickly transfer skills from the demonstrator to the learning agent, in this case the robot [41]. This was demonstrated in many robotic applications, which include not only waving in the air, but interaction and manipulation [42,43,44]. See Fig. 2 for several different examples.

Fig. 2
figure 2

Examples of adaptation of motion acquired through LbD on different platforms. a ARMAR-3 learning to wipe with visual feedback [45]. b CbI learning arm gesture motions (reprinted from [46], with permission from Elsevier). c TALOS manipulating a measuring tool. d Modification of bimanual motion on a bimanual KUKA LWR-4 platform (reprinted from [47], with permission from Elsevier). e COMAN humanoid robot full-body motion imitation (reprinted from [48], with permission from Cambridge University Press). f HOAP-3 performing full-body motion imitation — walking (© [2013] IEEE. Reprinted, with permission, from [49])

For fixed-base robots, transferring the demonstrated motion in joint- or task-space has been addressed in numerous publications [42]. The demonstrated motion needs to be collected through a proper interface, be it kinesthetically [50,51,52], visually [53], or through some other sensory system [54], and then transferred to the robot. The data collection does not need to be limited to the kinematic aspects of the task, but can include also the arising forces and torques [55,56,57]. Depending on the task, the collected motion can be encoded, for example, in DMPs, and then transferred in joint-space or in task-space, for one arm or two, in relative or absolute space [39•], etc. Even for such robots, the difference in embodiment between the demonstrator and the robot might distort the learned motion so that the imitated task is not properly performed. This correspondence problem [58] is much more evident in floating-base, potentially only dynamically stable robots, such as bipedal humanoid robots. Some methods for whole-body motion retargeting proposed in recent years include [59,60,61].

Besides demonstrator–imitator correspondence, imitation learning cannot be used for direct transfer of motion also because the state of the environment or of the manipulated object(s) is never exactly as in the demonstrated motion. In this sense, learning by demonstration is only useful if it allows for subsequent adaptation of the transferred motion [51]. Thus, the main advantage of imitation learning is in narrowing down the search space for subsequent learning, be it for manipulation learning or any other task. The adaptation is based on acquired sensory information, which can be visual feedback, force feedback, tactile feedback, etc. This was demonstrated for a fixed robot in [62], where a reactive impedance controller was added to the demonstrated trajectory at the acceleration level of the DMP. On a statically stable ARMAR-3 robot, the demonstrated planar wiping motion was adapted in one dimension through an admittance-based force control that combined iterative learning [45, 46]. Both these examples demonstrate that adaptation and additional learning was necessary.

Preserving Postural Stability in Imitation Learning

Let us first address fixed robots and statically stable humanoid robots. An example of the latter is the aforementioned ARMAR-3 humanoid robot, which has a humanoid head and upper body, but a wheeled platform [63]. Here, dynamics of the motion are not a problem, as the stability is ensured by the fixation, or by the platform. Even with statically stable robots, the embodiment might ask for modification of demonstrated motion, for example, to avoid self-collision [47], where the authors implemented an effective methodology that only modifies the motion if necessary, which was implemented through blending of primary and secondary tasks.

On dynamically stable robots, differences in embodiment are even more emphasized, and just transfer of the motion to the robot will result not only in poor execution of the demonstrated desired task, but also in the robot tipping over, at the least. Therefore, adaptation is required in the very core of imitation. In [64], a similar approach as in [47] was used, exploiting the blending of primary and secondary tasks. The demonstrated task was directly transferred in joint-space, unless the projection of the center of mass was approaching the edge of the stability polygon. Then, the primary task of maintaining stability would take over, and the demonstrated task would only be executed in its null-space. This was also the basis for off-line adaptation of demonstrated motion [49], where the robot would record the demonstrated task and then optimize the whole-body motion in order to maintain stability and approach the likeness of the demonstrated task as much as possible. Similar was also applied on the COMAN humanoid robot [48].

Statistical Learning From Multiple Demonstrations

Learning by demonstration can provide several examples of the desired manipulation action, but it is unlikely that any of the demonstrated actions will be appropriate for the current state of a dynamically changing environment — both in term of the required motion and the associated items involved in the action. However, it is possible to demonstrate more than one task and then use these to generate a new, previously not demonstrated instance of the task. If every trajectory is associated with parameters that describe the characteristics of the task, typically the goal or other conditions of the task [36, 51], then these parameters can serve as query points into the example database.

As explained in [51], the inspiration for such generation comes from motor-tape theories, in which example movement trajectories are stored directly in memory [65, 66]. Generalization from a database of recorded demonstrations was demonstrated on different robotic platforms and tasks. In their seminal work, Ude et al. [51] have shown how generalization from a set of trajectories can be used to generate accurate reaching, grasping, and throwing actions represented by DMPs. The approach combined locally weighted regression [67] and Gaussian process regression [68] to generate all DMP parameters. Later, the complete approach was demonstrated using GPR on reaching with different classes of actions [69]. The approach has been applied to dynamic movement primitives [e.g., 51, 69]. Generalization has been widely adopted in generation of motion also with variations of DMPs and other movement representations [70, 71]. An important alternative is to build variability in the representation itself, such as with nonlinear dynamic systems [24] and probabilistic movement primitives [72]. Figure 3 illustrates different aspects of generalization.

Fig. 3
figure 3

a Generalization for grasping (© [2010] IEEE. Reprinted, with permission, from [51]). b Database for generalization of reaching with both arms (reprinted from [69], with permission from Elsevier). c Generalization for periodic actions — drumming (© [2010] IEEE. Reprinted, with permission, from [53]). d Learning of CMPs and e database expansion on the KUKA LWR robot (© [2018] IEEE. Reprinted, with permission, from [73•]). Within e: (a) number of learning epochs without database — five on average. (b) Number of learning epochs with leave-one-out cross-validation — two on average. (c) Number of learning epochs through incremental database expansion — two to three on average. The numbers in the circles denote the order of learning and thus the order of database expansion (© [2018] IEEE. Reprinted, with permission, from [73•])

Generalization can also be used to tackle the dynamic aspects of motion. This is necessary if the dynamic models of the robot and the task are not known as it is often the case in imitation learning. Such models cannot be learned by imitation. Thus the challenge is to obtain the correct dynamic models for each task variation. The aforementioned CMPs can be used to describe single instances of tasks. Just as with kinematic data, multiple instances of dynamic data can be used to generate new dynamic motions [36]. In this work, the kinematic and the corresponding dynamic components of CMPs were generalized separately.

Autonomous Augmentation of Trajectory Databases

Generalization can only be accurate if a sufficient amount of training data is made available. If not, the user must demonstrate additional example executions of the desired task. However, this is time consuming and often requires a significant effort from the user. It is therefore advantageous if the robot could augment the available database without requiring additional user demonstrations. The available database can provide structure to bootstrap the autonomous acquisition of additional task executions and speed up the data gathering process [74].

In our approach, statistical generalization is used to produce good initial approximations for the new variants of the task. If the performed action (represented by DMPs, CMPs, or any other representation) satisfies the given criterion function, e.g., hitting the target for ballistic movements or trajectory tracking accuracy for compliant movements, the new data are immediately added to the database. If not, additional autonomous learning can be applied, starting from the initial movement provided by statistical generalization. Methods such as iterative learning control or reinforcement learning (see the Autonomous Learning and Adaptation of Manipulation Actions” section) can be used for this purpose. A complete system for autonomous extension of the database was proposed in [73•], where the new compliant motion trajectories were generated by statistical generalization. The approach was recently extended also for the periodic repetition of CMPs, which includes the ability for frequency modulation in [75].

Coaching: Learning With Human-in-the-Loop

Besides demonstrating the desired task executions to the robot, a human teacher (coach) can support the learning process by direct interaction with the robot. Such interactions put the human directly in the learning loop. This teaching process is also called coaching [46].

The interactions can directly influence the manipulation policy and can be specified in different ways. Gruebler et al. [76] used voice commands as a reward function in their learning algorithm. Verbal instructions were used to modify movements obtained by human demonstration in [77]. Direct physical contact of the user with the robot is also useful to indicate how the robot should alter its motion. For example, Lee and Ott [52] used kinesthetic teaching with iterative updates to modify the behavior of a humanoid robot. Coaching based on gestures and obstacle avoidance algorithms was applied to DMPs in [46] and combined with passivity in [78]. Coaching that involved changing of stiffness in path operational space, defined by a Frenet–Serret frame, was proposed in [39•]. Gams et al. [79] evaluated different interfaces and concluded that there is a clear advantage in using force-based coaching methods and that all coaching methods are applicable for rough approximations while accurate tracking is not viable. The coaching process should thus be enhanced by an additional autonomous adaptation method that allows fine tuning of the desired motion.

One of the challenges in autonomous adaptation methods is the design of reward functions, which is a complex problem even for domain experts [80]. Furthermore, acting in the real world and receiving feedback through sensors implies that the true state may not be completely observable and/or noise-free [81]. Besides the robot’s on-board sensors, additional external sensors are often applied. Recently it has been shown that learning systems can be effective even if the precise reward function is replaced by natural user feedback. For example, instead of precisely measuring how far the robot has thrown the ball, the user can only specify if the ball has hit the target or the throw was shorter or longer than required. Although such feedback is noisy and not optimal for teaching [82], it was nevertheless applied to successfully learn the ball-in-cup skill [80] and for robotic throwing [83].

Some examples of human-in-the-loop interaction are shown in Fig. 4.

Fig. 4
figure 4

Three instances of human-in-the-loop intervention. a Coaching through gestures (reprinted from [46], with permission from Elsevier). b Coaching through physical interaction (© [2016] IEEE. Reprinted, with permission, from [79]). c Schematics showing quantitative sensory, and qualitative human feedback, which acts as reward for reinforcement learning (© [2018] IEEE. Reprinted, with permission, from [83])

Autonomous Learning and Adaptation of Manipulation Actions

As explained in the previous section, the robot should in most cases autonomously refine the human-demonstrated movements to achieve the required performance of its own task execution. In the absence of the teacher, autonomous robots should also be able to find the appropriate control policies to perform the desired task either by starting from the task performed in a similar situation or even from scratch.

Traditional robot control methods assume that exact a priori models are available. Although remarkable results can be obtained in this way, model-based control can be very sensitive to inaccurately modeled system dynamics [84]. This problem is especially critical for robots operating in human environments, where compliant (low gain) control is usually required to assure the safety of humans, the environment, and the robot itself.

Most of the autonomous learning methods rely on a user-defined cost function. Reinforcement learning [81] is a method of choice for general cost functions that do not provide any additional information besides the evaluation of the motor command executed in a particular state. While general-purpose reinforcement learning can be applied to such cost functions, this type of learning is usually very slow, requiring many repetitions. More effective learning methods can be applied if the cost functions also provide some information how to change the parameters of the desired skill.

Since practicing, i.e., repeating the desired motion with real robots, especially walking robots, is extremely time consuming and also dangerous for the robots, a lot of recent research in robot learning has taken place in simulation [85]. However, many manipulation tasks cannot be learned without interacting with the real world due to the limitations of robot simulation systems. While previous sections explore both aspects of mobile manipulation, in the following sections the focus is on the approaches for manipulation learning on fixed upper-body humanoid robots. This way we avoid the problems with stability while keeping the possibility to experiment with real humanoid robots.

Iterative Learning Control

For many practical problems we can define a cost function that can be evaluated along the motion trajectory of the desired robot motion. Iterative learning control (ILC) [86] can be applied to optimize the robot motion if this cost function allows us to compute how to change the parameters of the manipulation skill; for example, it provides information about the sign of parameter change. The key idea of ILC is to use repetitive system dynamics to compensate for the errors. Although ILC is intrinsically robust to the variation of learning parameters, careful parameter tuning is still required.

We have successfully applied the ILC framework to improve the robot assembly operations acquired by human demonstrations. In automated robot assembly, the unavoidable positioning errors and tight tolerances between the objects involved require compliance and on-line adaptation of the desired trajectories. The resulting forces and torques describe the underlying assembly processes well enough to be taken as a reference for adaptation when transferring the demonstrated assembly policies to new locations [55]. At the new locations, the robot can apply ILC to autonomously improve the demonstrated policies by minimizing the discrepancies between the contact forces and torques arising during the initial human demonstration and the robot execution of the assembly task. Usually only a few epochs of learning are needed to adapt and improve the policy. Another example of the successful application of ILC for policy adaptation and refinement is bi-manual assembly of long poles [38], as shown in Fig. 5a.

Fig. 5
figure 5

a Humanoid robot torso in bi-manual assembly of long poles interaction (© [2015] IEEE. Reprinted, with permission, from [38]). b Humanoid robot during human robot cooperation to place the table cloth. The norm of the relative error in subsequent learning cycles is shown in the graph (reprinted by permission from Springer Nature from [87]). c The initial demonstration of wiping policy on a cylindrically shaped glass (top) and the humanoid robot while practicing the glass wiping policy on the oval-shaped glass (bottom) and d comparison of cost function evolution for AILC and hybrid AILC-RL scheme in a bar chart (© [2017] IEEE. Reprinted, with permission, from [88])

Figure 5b shows that ILC framework can be successfully applied also to improve physical human-robot cooperation. In this task, the human and the humanoid robot cooperate to place a table cloth on the table. ILC can be used to transfer the cooperative table cloth placing from one location to another [87]. In this task, the bimanual robot adapts its motion in the absolute coordinates (see the “Representation of Bimanual Motion” section). The results of adaptations are shown in Fig. 5b where only three to four adaptation cycles are needed to reduce the error in relative coordinates substantially.

Hybrid Reinforcement Learning and Adaptive ILC

The application of ILC can be problematic because it is sometimes difficult to set the free parameters (gains) of ILC in such a way that the learning system remains stable. The main problem is that as ILC start converging to the optimal solution, the ILC cost function signal starts oscillating, which prevents the parameters from converging to the optimal solution. This is especially true in cases when the environment dynamics is not known. In order to overcome this problem, various adaptive ILC (AILC) algorithms were proposed in the literature [89, 90] to adapt the gains during learning. Roughly, they can be divided into two sub-classes: (a) ILC with adaptation of the feedback in the current iteration loop and (b) ILC with the adaptation of the learning mechanism (also referred to as adaptation of the previous cycle learning). In order to assure the learning and closed-loop stability of AILC, several issues have to be considered. Unfortunately, some of these issues cannot always be resolved in practice. Consequently, there are only a few examples of successful application of adaptive ILC algorithms in robotic application.

Reinforcement Learning (RL) enables a robot to autonomously find an optimal policy by direct trial-and-error exploration within its environment [81]. It is often used in robotics to solve problems where models are not available. The main issue with general-purpose reinforcement learning is the high dimensionality of the parameter space arising in motor skill learning. Without any additional information, the robot must estimate the gradient of the cost function to compute the parameter updates, which is an expensive operation.

We have designed a hybrid system that combines the strengths of adaptive ILC and reinforcement learning. In the proposed system, reinforcement learning acts as a supervisor to compute the optimal skill parameters and ILC gains after every learning cycle. Since the general direction of adaptation is provided by ILC, this hybrid system converges much faster than standard reinforcement learning. On the other hand, the reinforcement learning selects the optimal set of parameters from the previous and the current learning cycle, which ensures stable operation of AILC even when the task dynamics is not known. We used PI2 reinforcement learning algorithm to implement the proposed hybrid scheme [88].

An example of successful application of hybrid AILC-RL learning framework is bi-manual glass wiping (see Fig. 5c). The initial control policy for cylindrically shaped glasses was obtained by human demonstration using kinesthetic guidance. After that, we replaced the glass with an oval, cone-shaped glass. Instead of demonstrating a new control policy for this glass, we applied AILC-RL to adapt the demonstrated control policy to the new shape. The aim was to achieve the same force-torque profile as applied to the cylindrically shaped glass. During adaptation it was necessary to consider that we handle fragile objects, so the forces and torques during adaptation were limited to small values. AILC-RL ensures convergence even when the feedback gains are low, as it is necessary to prevent high forces and torques from arising. The comparison of the evolution of both cost functions shows that AILC-RL preserves both the stability of reinforcement learning and the adaptation speed of AILC.

Reinforcement Learning in Physically Constrained Environments

Many robotic tasks are performed in contact with an environment that restricts movement to only one degree of freedom. Examples of such tasks are opening and closing doors, drawers, cabinets, sliding doors, latches, etc. Learning such tasks is easier because the space of parameters is one dimensional. However, we do not know the limitations of space in advance. Similar to the previous section, where we used AILC as the search algorithm, this time we use an intelligent compliant controller for this purpose. The underlying controller, which acts as a policy search agent, generates movements along the admissible directions defined by the physical constraints of the task. We employ variable compliance to assure that the robot is stiff in the tangential direction of motion and compliant in the orthogonal directions. This is accomplished by attaching a Frenet-Serret frame (coordinate frame constructed from tangential, normal, and bi-normal vector) to the motion trajectory and defining stiffness along the axes of this frame [39•]. Experimental results show that only a few learning cycles are required for a robot to learn such tasks completely autonomously, without any prior demonstration [91].

Deep Neural Networks for Perception-Action Coupling

The statistical learning approaches described in the “Statistical Learning From Multiple Demonstrations” section require that the programmer defines a query point, which is used to index into the database of example trajectories to compute the appropriate motion for the current state of the external world. The query points usually relate to the desired task, e.g., the desired final pose for reaching movements or the target position for ballistic skills. While this can be highly effective when the goal of the task can be easily described with a few parameters, this is not always possible. In some cases it is better to specify query points by images or even videos.

End-to-End Generation of Manipulation Policies

In our work we focused on how images and image sequences can be transformed into manipulation primitives represented by dynamic movement primitives. Our starting point was the universal approximation theorem for deep neural networks, which indicates that neural networks have a sufficient representational power to learn highly nonlinear mappings that link high-dimensional inputs such as raw images to DMPs. We first tackled the issue of handwriting, that is, how to translate between visual representations of digits perceived from humanoid robot’s visual system and the action representations needed to control the humanoid robot’s motion trajectories required for handwriting. We addressed this issue by proposing a fully connected image-to-motion encoder-decoder neural network architecture (IMEDNet) [92], which took inspiration from autoencoder neural networks [93]. While the original IMEDNet network was useful for converting images of digits into motion trajectories, certain difficulties became apparent when considering real-world scenarios in which, for example, a robot is shown a digit on a piece of paper held in front of its camera or written in free-form on a whiteboard, and must generate the corresponding handwriting motion. In such cases, the position and orientation of the digit in the acquired camera image is not known, which we overcame by including the spatial transformer into the proposed architecture [94]. Moreover, to reduce the number of parameters in the neural network and take into account the nature of input data, i.e., camera images, we included convolutional layers [95] into the proposed architecture. Finally, to improve the accuracy of the learned neural network models, we developed an optimal criterion function to train the proposed neural network and showed how to compute its gradients for backpropagation [13••]. Its distinguishing feature is that it measures the real distance between handwriting trajectories as opposed to the distance between DMP parameters, which have no physical meaning.

Finally, in many tasks, especially in the context of human-robot interaction, it is insufficient to use single images as input to generate the appropriate robot responses. Full videos should often be used instead, but variable-length videos cannot be processed by feedforward neural networks. We therefore included the LSTM components into the feedforward neural network architectures described above. The resulting recurrent neural network architecture has been shown to be effective for prediction of human intentions [96].

Reduction of Search Space

One of the possible applications of deep neural networks is the reduction of dimensionality [93]. While DMPs provide a relatively low-dimensional representation of the action space, the dimensionality of the DMP parameter space is still rather high [97]. It has been shown that deep autoencoder (AE) neural networks, where the data is pushed through the layer with the smallest number of neurons — the latent space, are superior to standard dimensionality reduction approaches such as Principal Component Analysis (PCA) [93]. In our work we showed that faster convergence of autonomous learning methods, e.g., reinforcement learning, can be achieved when latent space representations computed by deep autoencoder neural networks are used to generate a low-dimensional representation of robot manipulation skill [98]. In addition, we have demonstrated that generalization methods can be used to generate data for autoencoder neural network training in simulation [99].

Conclusions

Significant progress has been achieved in the area of manipulation learning on humanoid robots over the last 25 years. The main contributions of our group include statistical methods for learning movement primitives from multiple demonstrations, new learning methodologies and representations that combine kinematic and dynamic aspects of manipulation tasks, manipulation learning of bimanual tasks, new algorithms for autonomous learning of manipulation tasks by combining adaptive ILC and reinforcement learning, and the development of new neural network architectures to directly translate sensory signals into manipulation primitives. Together these methods contribute the building blocks to develop behaviors and learning methodologies at the sensorimotor level of the humanoid robot’s overall cognitive architecture.

Manipulation learning on humanoid robots remains an open research area, eagerly awaiting progress in humanoid robots’ capabilities. On the hardware side, soft robotics and compliant actuator designs can make a significant contribution. Together with new AI approaches based on the availability of vast quantities of data, increased computational power, and deep neural networks, we expect significant progress in the near future. The important problems that remain to be resolved include transferability of results from simulation to the real world and between different robots, which is problematic especially for dynamic tasks.