1 Introduction

In many urban scenarios for disaster response and relief, there are high risks for human response forces, e.g. from dangerous materials, radiation, explosive atmosphere or unstable buildings. In such cases, it would be highly desirable to send robots being able to perform tasks like human responders, i.e. being able to act as their remote ‘avatars’. As such robots are expected to operate in environments made for humans, even if partially degraded, and to use human tools, humanoid robots appear as a potentially attractive class for robots. For instance, door handles are located at a reachable height, stair steps have a comfortable height for the human leg and tools are designed to be grasped by human hands. Therefore towards this vision, a supervised semi-autonomous humanoid robot is considered to enter a mainly unknown, potentially degraded human environment and to perform highly diverse disaster recovery tasks on-site.

In this paper, specifically manipulation tasks are considered. To achieve its first responder mission, the robot should be able to use in principle any object it can find in the environment as tool for achieving its current manipulation tasks. Because of the potentially high degrees of uncertainty and unstructuredness of the environment and situation on-site, such capabilities appear to be impossible to achieve for a fully autonomous robot for a long time. However, a remote human operator can be considered which supervises and assists the robot through a limited communication link in collaborative perception and planning to enable, e.g., the use of objects known by the robot for new purposes as well as the use of a priori unknown objects available on-site.

Fig. 1
figure 1

Human operator (left) supervising and generating high-level task commands to be executed by the remote semi-autonomous robot (right)

Exploring an unstructured, potentially degraded scenario is a complex task for robots. Requirements for these tasks consider recognition of the environment, mission planning, locomotion, grasp planning, and object manipulation. On one side, performing such tasks in a full autonomous way is still an unsolved problem [19]. Highly diverse and unforeseen situations are still an open challenge for the so far main thrust of research for autonomous robots. On the other side, performing such tasks in a pure teleoperation way presents challenges for the operators such as executing motions maintaining robot stability, avoiding self and environment collisions, and operating with limited feedback. Communications with a robot inside these scenarios might also be challenged by limited connectivity, amount of available data, and latency which can drastically delay task execution. Pure teleoperation requires training and expertise, and also derives in a high mental workload for the operators.

Both, autonomous and teleoperated robotic approaches have pros and cons, for this reason a middle ground approach has emerged. In an approach where a semi-autonomous remote robot performs tasks as first responder in place of a human, or so-called avatar [12], strengths from both, full autonomous and purely teleoperated approaches can be combined while at the same time their weaknesses can be tackled. To approach this, a remote manipulation task can be divided into sub-tasks such as sensing, planning, action, and evaluation [10]. A proper distribution of these sub-tasks between the human operator and the remote robot can increase the efficiency and potential of success in a manipulation task. On the one hand, a remote human can aid by collaborating with perception and planning through a, however, limited communication channel. A human operator can trivially plan a task (planning), identify objects in the sensor data acquired by the robot (perception), and verify the completion of a task (evaluation). On the other hand, it is challenging to remotely control in joint space a robotic system like a humanoid robot that can easily have over 25 degrees of freedom (DOF) [3]. For this reason, in the approach presented in this paper the complex sub-tasks such as motion planning, motion execution, and obstacle avoidance (action) are performed (semi-)autonomously by the remote robot (see Fig. 1). With this distribution of tasks, the challenge of converting the operator intent into a robot action arises.

This paper builds on the contributions of the authors and their co-authors to the state of the art in human supervision of semi-autonomous humanoid robots [7, 15]. It aims towards higher levels of autonomy and abstraction in operator-robot interaction for manipulation tasks. The paper is organized as follows: In Sect. 2 we begin by describing the related state-of-the-art approaches for human supervision of semi-autonomous robots. Then in Sect. 3 we describe our approach to interaction with an avatar robot. On Sect. 4 we show laboratory experimentation of the approach. Section 5 describes the lessons learned form these approach. Section 6 describes the relationship and impact of this approach with the RoboCup Robot Rescue League. Finally, our conclusions and proposed future work are discussed in Sect. 7.

2 Related Work

In recent years, successful approaches that consider a human operator in the loop for remote semi-autonomous manipulation tasks have been proposed.

Johnson et al. proposed coactive design as a series of guidelines to design as system with focus on exploiting the interdependency relationship between humans and robots when working together [5]. Coactive design approaches human supervision from the perspective of teamwork between humans and robot to achieve a task. For example, from the manipulation aspect, coactive cesign considers using virtual interactable objects as means for an operator to control behavior of a humanoid robot [6, 8]. These interactable objects allow an operator to select different grasp poses for the end-effectors, select different manipulation stance poses, and also the ability to perform footstep plans for locomotion with respect to the object of interest. In this approach, end-effectors can be linked to these interactable objects so that the human operator can manually adjust the pose of this object in order to generate an arm trajectory to be executed by the robot.

Fallon et al. made several contributions towards robot perception and control, which in collaboration with a human supervisor provide an efficient approach to manipulation. In [1], they propose the use of CAD models of objects to be used as virtual environmental features that present action possibilities to an operator. These models provide information such as manipulation stances or potential grasp locations which are computed using an optimization toolbox called Drake (Tedrake [18]).

To aid a remote robot to execute a manipulation task with an object, the affordances of such object can be used as a tool to describe the possible actions that can be perform with it. The concept of affordances was introduced by Gibson [2] in psychology, but nowadays it is also being used by the robotics community [16]. For example, a contribution towards control of avatar robots by Hart et al. [4]. They proposed a related interaction method by using a combination of their approach to affordances and the concept of object templates. These affordance templates are designed to provide robot-agnostic grasp poses and manipulation trajectories. Their Affordance Template ROS package provides a human operator with a high level of adjustment and interactivity of the geometry information from the affordance templates used to represent real objects. They propose to pre-define end-effector waypoints with regards to the template frame of reference and let the operator iterate through them to generate the required trajectories to achieve the task. The scale of the affordance template and the pre-defined waypoints can be adjusted on-line to match similar objects of different sizes.

In contrast to these state-of-the-art approaches, the object template approach proposed by the authors for interaction with avatar robots focuses on providing the human supervisor with an affordance [2] level of control that allows solving tasks in versatile ways [14]. This approach also allows to be abstracted into higher levels of autonomy which can provide a more efficient interaction by selecting the required manipulation task and letting the robot autonomously plan and execute the appropriate motions.

3 Manipulation Interaction with Avatar Robots

An efficient interaction between the human supervisor and the remote avatar robot requires to use commands to describe tasks at a sufficiently high level of abstraction. In the approach proposed in this paper, this generation of high-level commands is based on the affordance concept. However, communicating high-level information to a remote robot requires an entity of information that is human as well as robot friendly. For this purpose, the concept of object template has been developed.

3.1 Object Template Approach

An object template is an effective virtual representation of an object of interest that contains information that a remote robot can use to manipulate such object or other similar objects [13]. The object template concept presented here goes beyond state-of-the-art related concepts by extending the robot capabilities to use affordance information of the object. This concept includes physical information (mass, center of mass, inertia tensor) as well as abstract information (potential standing poses, grasps, affordances, and usabilities) of the object as shown in Fig. 2. Object templates provide an interaction method to rapidly communicate to the robot the physical and abstract information of the objects of interest.

Because humans are very good at analysing a situation, planning new ways of how to solve a task, even using objects for different purposes, it is important to allow communicating the planning and perception performed by the operator such that the robot can execute the action based on the information contained in the object template. This leverages human intelligence with robot capabilities with focus on aiding the robot by providing the motion constraints required to achieve a task while manipulating an object. For example, as an implementation in a 3D environment, an object template can be visualized as a 3D geometry mesh that simulates an object of interest. A human operator can manipulate the object template and assist in fitting it to visualized sensor data of the real object, which corresponds to the planning sub-task. This implicitly provides 3D world pose information of the object of interest and explicitly provides physical information of the object which can be compressed and sent using low bandwidth communication. Then, the remote robot can perform the action sub-task by using the information of the object template to approach, grasp, and manipulate the real object (see Fig. 2). During the execution of the action, the human supervisor can then perform the evaluation of these actions.

As described in Sect. 2, some of the current approaches require that the operator manually changes the pose of the object template to generate a trajectory of the end-effector [6]. However, relying on the operator to manually change this pose prevents trajectory generation to be abstracted into higher levels of autonomy. Other approaches focus on providing a set of pre-defined end-effector waypoints with respect to the object template pose to generate the required end-effector trajectories to achieve the task. This limits flexibility by preventing the robot from performing trajectories with different end-effector poses. While these current state-of-the-art approaches focus on an object-grasp-centered means of interaction, our approach aims at providing physical and abstract properties of the objects of interest. With this information, the robot can perform autonomous subtasks like locomotion through the environment, grasping objects, and manipulating them at an affordance-level avoiding collisions with the environment in order to efficiently accomplish the manipulation task needed.

Fig. 2
figure 2

Example of an object template. The human operator validates the sensor data corresponding to an object and assists in positioning the corresponding object template. The robot can then use the pose of the object template to estimate the location of the real object

3.1.1 Affordances

To manipulate objects, these object templates were designed to provide information about their affordances. In this approach, affordances are used to describe the possible actions that can be performed with these objects. For example, a door will have two affordances, one for turning the handle and one for opening the door. Affordances can describe circular or linear motions that the end-effector of the robot will be constrained to follow. These constrained motions are calculated on-line based on the frame of reference of the object template. For example, in Fig. 2, the affordance of the drill will generate a constrained linear motion parallel to the red axis which corresponds to the “pushability” affordance of the drill.

3.1.2 Usabilities

Usabilities are defined as frames of reference designed to represent parts of the objects grasped that are relevant to achieve a task. As seen in Fig. 2, parts of interest of the drill could be the trigger and the tip. If the robot is commanded to use the drill to execute an affordance of another object (e.g. a wall), the human supervisor can update on the fly the usability of the object grasped that is required to achieve the task. This changes the end-effector trajectory planning form being calculated, for example, with respect to the “hand”, to be calculated with respect to the tool tip.

3.1.3 Object Template Framework

The information contained in the object templates needs to be systematically organized. Therefore, it has been divided in three main groups.

Fig. 3
figure 3

Object Template Framework. These groups of information are related by a unique identifier for each type of object. The grasp library and the robot library have a relationship of many-to-one with the object library

Object Library contains specific object information and is robot agnostic. Physical and abstract information relevant for manipulation can be described, such as shape, mass, center of mass, inertia tensor, affordances, and usabilities.

Grasp Library contains the specific information that correspond to use of the end-effector of the robot. Here, the potential grasp poses as well as information about the posture of the finger joints to grasp an object can be described a priori.

Robot Library contains specific information of the robot platform. Currently, it is used to describe potential robot stand poses facilitating object reaching and grasping.

This distribution allows the object template approach to be platform independent. Different robot platforms can be used (e.g. wheeled, tracked, or bipedal among others). And also different types of end-effectors can be used (in the case of humanoid robots, each arm can have a different type of hand). The diagram in Fig. 3 depicts how this information is organized.

3.2 Flexibility and Versatility

The use of remote humanoid robots as avatars is expected to be intuitive to operators since human-like abilities are expected to be more natural to domain experts like first responders as operators. This allows operators to picture themselves in the remote environment and think how to solve a task, however, task requirements such as special tools might not be found. For this reason, a flexible means of interaction that can account for allowing improvisation and plan adaptation from the operator is also needed. In this approach, improvisation is described as “a change of a plan on how to achieve a certain task, depending on the current situation”. A human operator can then improvise, e. g., by adapting the affordances of known objects into new unknown objects. For example, by utilizing the affordances defined in an object template on a new object that has similar physical properties or which manipulation skills belong to the same class (see Fig. 4).

Fig. 4
figure 4

The supervisor is able to command the robot to execute the valve turn affordance, however because of debris blocking the way, the supervisor is utilizing a stick to achieve the task [14]

3.3 Collaborative Autonomy

Collaboration between humans and robots still needs to be pushed towards higher levels of autonomy. For this reason, it makes sense to use high-level behaviors to automate these collaboration. From the perspective of the presented approach, the general workflow of a task can be depicted as in Fig. 5. In a manual approach, these steps can be executed by a human supervisor, which will iterate through them verifying the success of each action. Collaborative autonomy aims at using a high-level behavior (e.g. a finite state machine) to automate these steps. Then, the human supervisor will only act as observer unless the robot requires assistance. From the perspective of the high-level behavior, steps can be performed either by the robot or by the human. This collaboration allows tasks to be efficiently executed, either by the autonomy of the robot, or by human intervention. This high-level control philosophy has been implemented in FlexBE [17].

Fig. 5
figure 5

General workflow of a task. When using collaborative autonomy, the high-level behavior (shaded blocks connected by double-line arrows) commands the robot (single line arrows) to perform tasks autonomously. The supervisor observes, and if needed intervenes executing the current step (double-headed single-line arrow)

4 Experimental Results

In this section selected experimental results are reported as proof of concept to demonstrate how using the object template approach allows to solve manipulation tasks in a versatile way. For example, being able to select points of interest in a grasped object can increase the potential of achieving a manipulation task by using specific points of interest in the grasped objects and extending the reachable workspace of a robot.

The first experiment demonstrates the theoretical grounding of the approach by analysing the pattern followed by a usability with respect to an affordance. A Board Marker Template is created with a usability “tip” located at the painting edge of the board maker. The board marker is grasped by the robot and a Wall Template is used to create circles in a white board. Since the board marker is grasped with the finger tips as shown in Fig. 6, the operator needs to align the Board Marker Template with the real object.

Fig. 6
figure 6

Team Hector’s humanoid robot Johnny grasps a board marker with the finger tips (top left) and draws circles on the target board (top right). Three blue circles with different radii are drawn using the usability in the Board Marker Template and the circular affordance of the Wall Template (bottom left). A digitalization of the circles drawn by the robot is made to show the pattern results (bottom right)

After alignment is complete, the operator attaches the Board Marker Template to the right hand and selects the usability “tip” from the user interface. The kinematic chain of the end-effector gets updated with the new transformations. Now that the Board Marker Template is attached, the operator can move the template to an initial position for drawing the circle through the user interface. With the marker in the initial pose, the operator can request a circular affordance from the Wall Template that will execute the necessary joint motions to move the board marker in a circular pattern around the center target of the Wall Template. From the digitalization made from the circles drawn by the robot, it can be seen that the three blue circles share the same center. However, inaccuracies from the manual alignment of the Board Marker Template from the human operator generates an error of 1.1 cm in the resulting pattern. For the purposes of the approach, these inaccuracies are not considered significant given that the tasks of the robot do not require high precision manipulation. The complete process of operator alignment and use of interface can be seen in this video.Footnote 1

The second experiment shows how using object usabilities, a proper motion pattern can be generated while planning with respect to specific points of interest in tools. In this example, the robot needs to use a tool (e.g. a drill) to cut out a circular pattern in a dry wall of around 20 cm diameter. When using the Drill Template, the nominal grasp is located around the grip of the drill, making the bit to be located 9 cm above the frame of reference of the end-effector. To draw a circle where the operator has planned, the drill bit needs to rotate around the center of axis rotation of the Wall Template. The bit usability will provide the motion planner with the right transformation to generate this pattern as seen in Fig. 7. An example of the transformations generated for this experiment can be seen in this video.Footnote 2 Since this transformation is calculated online, the drill can have any arbitrary orientation, in this case it is rotated ninety degrees compared to the orientation of the drill shown in Fig. 2. The experiment is performed placing a board marker in the place of the bit and painting a circle in a white board in order to observe the generated pattern.

Fig. 7
figure 7

Team ViGIR’s Atlas robot’s first person view of the Draw Circle in the Wall task. An ovoid shape pattern can be appreciated which is generated due to joint controller inaccuracies generated by the drill mass, which is around 1.3 kg

The third experiment is performed to demonstrate how the human operator can command the robot to use an object as an online-augmented end-effector. In this experiment, the robot is required to turn a valve, however, the robot is unable to do this without the use of a tool because the valve is in a higher place than the robot can reach as shown in Fig. 8.

Fig. 8
figure 8

Operator view of the experiment setup. The operator has requested point cloud data of the environment and the Valve Template (purple) has been located to match the sensor data of the real valve. The Atlas robot is unable to reach the valve as shown by the green ghost robot used for previewing the target arm motions

For this experiment, a long L-shaped stick (in this case a paint roller) which can be grasped and used to reach the valve is provided as shown in Fig. 9. The length of the paint roller was fixed, however, the precise total length is not relevant as long as the distance between the point where the robot grasps the object and the “roller” part is sufficient to reach the valve. This distance is automatically considered in the kinematic transformations after the operator requests that the object (the paint roller) gets attached to the end-effector (the hand).

Fig. 9
figure 9

The Paint Roller Template provided contains three usabilities: the grip edge, the origin of the template and the roller

The human operator identifies and commands the robot to grasp the paint roller. Once the robot has grasped the object, the human operator adapts and validates the alignment of the Paint Roller Template to match the pose of the real object in the robot’s hand. To be able to turn the valve, the point that needs to follow a circular path around the axis of the valve is not located in the robot’s hand but in the “roller” part of the object. To plan with respect to this point of interest in the grasped object, the operator can select the usability that belongs to that point (in this case the “roller usability”). The human operator can then command the robot to execute the turning affordance of the valve with this online-augmented end-effector as shown in Fig. 10. This videoFootnote 3 shows the complete process of grasping and manipulating the paint roller using its usabilities to rotate the valve.

Fig. 10
figure 10

Top left Atlas robot turning a high, non-arm-reachable valve using a “paint roller” as online-augmented end-effector. Experimental setup shows a 25.4 cm valve located at 2.40 m height. Bottom left Atlas grasping the paint roller and inserting it between the valve cross-bars. Right Operator view of the experiment setup and the circular path performed by the robot’s hand shown in dark green

5 Lessons Learned

Human supervision of an avatar robot is still stressful and demanding, even with a semi-autonomous robot which is expected to do as it is commanded. Such highly complex robotic platforms are subject to hardware and software failures as well as to environmental conditions that can easily prevent a mission from being accomplished. Supervisors are required to account for this by improvising and changing task plans on the fly. For these reasons, the most noteworthy lesson learned is to design a flexible system that allows humans and robots to adapt on the fly to unforeseen task conditions.

Having a human in the loop is a promising approach for assisting and providing information to a remote robot. For instance, high level perceptual information from the environment is a most recurrent information that robots need from humans. However, manually providing this information to the robot is time consuming. A bottleneck in the current implementation of the presented approach is manual registering an object template to the sensor data of the real object by the human supervisor. For this reason, a highly robust semi-automated approach for object recognition and pose estimation is desirable [9, 20].

Another lesson learned is that by automating the task execution with a high-level behavior, the mental workload of the supervisor is further relieved to focus on verifying the proper execution of the task. Additionally, having the logic of the high-level behaviors (i.e., the hierarchical state machines) reflect the manual workflow of the human operator, allows to easily follow the behavior execution and, if necessary, intervene.

6 RoboCup and Humanoid Avatar Robots

The formidable vision of RoboCup is a team of humanoid robots capable of beating the human soccer world champion in 2050. This vision has been extended to challenges with high societal relevance in @Home, Rescue and Industrial categories. All of these challenges share that their environment has been designed for humans (although partially degraded in Rescue). Therefore an obvious choice of intelligent robots capable of solving versatile tasks using human tools in these human environments are humanoid robots. However, the capabilities of todays humanoid robots are still far from meeting the requirements posed by these challenges. Therefore, a potentially promosing direct of RoboCup could be to introduce humanoid robot avatars for Rescue Robot and @Home Leagues. A short-term step could be the joint consideration of handling of steps, stairs and doors which both leagues are not yet well addressing. Also the close collaboration of the robot with a remote human supervisor through a restrictured communication link could add interesting new research features which are also of practical relevance, e.g., for an @Home humanoid avatar robot.

The authors and their colleagues entered the research area of disaster robotics through participation with Team Hector in the RoboCup “Robot Rescue League” (RRL) with autonomous wheeled and tracked robots since 2009. Based on the background and expertise gained from the RRL, they successfully participated in all three events of the recent DARPA robotics challenge (DRC) between 2013 and 2015 as part of Team ViGIR and in addition with Team Hector also in the DRC Finals.

While funded robotic competitions give a short-term push to the development of a field, they do not provide a sustainable long-term development path. The RRL has been introduced in 2001 based on an aftermath analysis of the Kobe earthquake in Japan. It addresses fundamental research and development topics by providing standardized environments and benchmarks for rescue robots. These tests represent major challenges typically encountered in urban search and rescue scenarios such as climbing stairs or opening doors. The Japanese robot Quince [11], which has been one of the first robots used in the damaged Fukushima Daiichi plant, has been first developed and tested within the RRL.

The authors suggest that with its long-term and ambitious vision, the RRL is well qualified for including open research questions for humanoid avatar robots to provide a sustainable test bed for the required long-term roadmap for research and development towards this ambitious goal. While the current RRL objectives focus on mobility towards victim localization, no additional focus has been made towards manipulation of debris and use of human tools for execution of disaster recovery tasks. For example, to negotiate a debris structure blocking a required path to a victim, designing a “repeatable” debris wall that will fall in a random way upon robot contact will consider some flexibility and adaptability to unforeseen situations which is a challenge that should be prioritized. While opening of pull and push doors is newly considered in the 2016 concept for the RRL,Footnote 4 additional manipulation tasks can be pushed towards use of other known objects such as windows, valves, or activating fire sprinkler systems among others.

7 Conclusions and Future Work

In this article, several contributions to the timely field of supervised remote semi-autonomous avatar robots are described, which enable highly capable, diverse and flexible remote mobile manipulation abilities. A new concept for object templates as abstract representation of real objects has been developed serving as effective means of interaction between human operator and remote robot. The new concept is at the same time human friendly and robot friendly to understand and by means of the developed framework and interaction modes and interfaces through an operator control station effectively and efficiently usable. It enables a remarkable high degree of flexibility and capability as, e.g., needed for robots used for disaster recovery and response tasks where high adaptability of the type of manipulation tasks and objects to be used is required during a mission subject to limited communication bandwidth. The developed approach also plays an important role in an advanced concept for collaborative autonomy to achieve and adapt task level autonomy before the start and during the execution of a robot mission. Besides the experiments mentioned in this paper, a remarkable number of convincing experimental evaluations have been conducted with several highly advanced human size humanoid robots and multiple different robot hands in challenging competition as well as in lab scenarios, see e.g. [1315].

It should be noted that the implementation of the described object template approach is available as open source code and wiki as part of a larger software suite and architecture for humanoid avatar robots from: