Towards a Task-Aware Proactive Sociable Robot Based on Multi-state Perspective-Taking

Pandey, Amit Kumar; Ali, Muhammad; Alami, Rachid

doi:10.1007/s12369-013-0181-3

Towards a Task-Aware Proactive Sociable Robot Based on Multi-state Perspective-Taking

Published: 28 February 2013

Volume 5, pages 215–236, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Social Robotics Aims and scope Submit manuscript

Towards a Task-Aware Proactive Sociable Robot Based on Multi-state Perspective-Taking

Download PDF

Amit Kumar Pandey^1,2,
Muhammad Ali^1,2 &
Rachid Alami^1,2

1012 Accesses
28 Citations
Explore all metrics

Abstract

Robots are expected to cooperate with humans in day-to-day interaction. One aspect of such cooperation is behaving proactively. In this paper we will enable our robots, equipped with visuo-spatial perspective-taking capabilities, to behave proactively based on reasoning ‘where’ its human partner might perform a particular task with different effort levels. For this, the robot analyzes the agents’ abilities not only from the current state but also from a set of different states the agent might attain.

Depending on the task and the situation, the robot exhibits different types of proactive behaviors, such as, reaching out, suggesting a solution and providing clues by head movement, for two different tasks performed by the human partner: give and make accessible. These proactive behaviors are intended to be informative to reduce confusion of the human partner, to communicate the robot’s ability and intention and to guide the partner for better cooperation.

We have validated the behaviors by user studies, which suggest that such proactive behaviors reduce the ‘confusion’ and ‘effort’ of the users. Further, the participants reported the robot to be more ‘supportive and aware’ compared to the situations where the robot was non-proactive.

Such proactive behaviors could enrich multi-modal interaction and cooperation capabilities of the robot as well as help in developing more complex socially expected and accepted behaviors in the human centered environment.

This Robot is Sociable: Close-up on the Gestures and Measured Motion of a Human Responding to a Proactive Robot

Article 08 April 2015

Stepped Warm-Up–The Progressive Interaction Approach for Human-Robot Interaction in Public

Impression Evaluation for Active Behavior of Robot in Human Robot Interaction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As robots move towards being cooperative and social, the challenges of incorporating the basic ingredients of such behaviors are becoming prominent. In this context, just being reactive or active is not sufficient. Behaving proactively in a human centered environment is one of the desirable characteristics for the social robots [13, 47].

Proactive behavior, i.e. taking the initiative whenever necessary to support the ongoing interaction/task, is a mean to engage with the human, to satisfy internal social aims such as drives, emotions, etc. [14]. Proactive behavior can be tackled at different levels of abstraction and through various perspectives. We are essentially interested here in one particular topic: the synthesis of proactive motions with the aim to facilitate human-robot collaborative task achievement. In this stream, one non-trivial aspect is to determine “where” (an important aspect of joint task [55]) a joint action should preferably take place and what the robot can do to “propose/communicate” the choice it has made.

We have already developed a system, which enables the robot to perform a set of basic human-robot interactive manipulation tasks such as give, show, hide, make-accessible an object to the human [43]. In this paper, we will focus on a complementary aspect of such interactive manipulation in which the human will be required to perform the task or to contribute to a joint task in order to achieve some joint goal. We will consider two such tasks:

(i)
Give: The human has to give some object to the robot by holding it somewhere, and
(ii)
Make Accessible: The human has to make some object accessible to the robot by placing it somewhere.

As shown in Fig. 1(a), if the robot asks, “Please give me the toy dog …” and remains in the rest position, this might create confusion for the human about how and where to give: “Should I move and reach out to the robot to put the object in its hand?”, “should I put it somewhere on the table for the robot to take it?”, etc. Now assume that the robot, along with its request to give the object, also shows proactive reach out behavior by moving its hand to take the object, as shown in Fig. 1(b). This might significantly guide the human about how to perform the task and where to give. Moreover, if it will reach out to a place, which is convenient from the human’s perspective to give the object, it will also reduce the human’s effort.

Similarly, if the human has to make some object accessible to the robot, the robot could proactively advise the human about where to put the object so that the robot would be able to take it. This will reduce the human’s confusion. Again, if it takes into account the human’s effort to find a human-adapted proactive solution, it will further reduce the effort of the human.

Expressive behavior coupled with perspective-taking has been shown to be important for the socio-cognitive aspect of Human-Robot Interaction [3]. Hence, while finding a human-adapted solution for a task, the robot takes into account the visuo-spatial perspective of the human. Moreover, the robot performs such perspective-taking not only from the current state of the agent, but also from a set of different states achievable by the agent; hence the term multi-state perspective-taking is derived.

Apart from reducing the ‘confusion’ and ‘effort’, with such proactive behaviors the robot might also better communicate its ‘abilities’ and its ‘understanding’ about the abilities of the human partner. In this paper, we will hypothesize such proactive behaviors; present a framework to instantiate a solution for such behaviors; and validate through user studies that proactive behaviors indeed reduce the ‘confusion’ of the human and incorporating the human adapted aspects further reduce the ‘effort’ of the human partner.

2 Related Work

Proactive behavior could be at various levels of abstractions and could be exhibited in various ways ranging from simple interaction [36], to proactive task selection [5, 35, 52, 53]. Our work has been applied to and instantiated by the domain of human-robot cooperative object manipulation, in which interactive pick and place [30] and handover tasks have been identified as an essential capability for a robot assistant. From this, what is pertinent is where to place the object, the robot’s gripper, in which direction to orient the robot’s head, etc., in order to provide maximum information and help. In [55] humans’ ability to estimate such aspects of ‘what’, ‘when’ and ‘where’ parts of others’ actions and their importance in online action coordination have been shown. Hence, the robot should also be equipped with such abilities for better coordination and cooperation even while deciding to behave proactively.

In the context of proactive behavior, estimation and utilization of ‘what’ has been addressed by various researchers. In [52, 53], the robot estimates what the human wants and selects a task using a probability density function. In [23], a cost based anticipatory action selection is done by the robot to improve joint task coordination. ‘When’ has also been addressed in various ways. In [34], temporal Bayesian networks are used for proactive action selection to minimize wait time. In [7], a robot wheelchair takes control when a handicapped human needs it and in [8], an activity constraints violation based scheduler is used to remind the human. Using the information about ‘how’ for behaving proactively has been shown in [16], where switching hidden semi-Markov model is used to learn a house occupant’s daily activities and to alert the caregiver in case of abnormality. There are various other works which shows the robot learning/understanding ‘what’ does a task ‘mean’ and ‘how’ the task is performed [9, 44, 45], at symbolic level of abstraction, which could be used for synthesizing and showing proactive behaviors. On the other hand, from the point of view of utilizing the information about ‘where’ the human can perform the task and showing proactive behavior accordingly, there has not been significant work in robotics. In [58], a robot wheelchair predicts the location the user is trying to go, by using a POMDP model, and drives him/her there. In Human-Human interaction, the notion of proactive eye movement have been identified [18], and further in [54], such proactive gaze have been proposed as a dimension to measure HRI. However, their notion of proactive gaze corresponds to predicting the goal of the action, and then proactively shifting the gaze directly towards the goal. This notion of proactivity is complementary to the proactive behaviors within the scope of the paper, in the sense instead of shifting its gaze proactively based on the human’s action, the robot proactively finds a solution for the human action and suggests it through its proactive actions. However, such proactive actions might include proactive gaze as a component or might induce the human partner’s proactive gaze.

Estimating ‘where’ the human can perform a task is helpful for sharing attention with others [59] and to predict spatial characteristics of others’ actions [29]. These are essential for building the robot’s theory of mind [51] and consequently can help to guide the human’s behavior [33, 60] towards the robot.

Our interest is to device proactive behaviors based on ‘where’ the task could be performed. Let us derive the interest behind such proactive behaviors and at the same time identify some of their key ingredients. In [25], it has been shown that in general the participants appreciate the robot’s initiatives of showing some movements as an engagement attempt. The experiment was in a receptionist-visitor scenario, and the type of initiative was gaze shifting, which is different from our HRI object manipulation scenario and the associated proactive behaviors’ types. However, the finding that the participants rated the robot higher when it showed some movements in its gaze than just being still, points that proactive initiatives of the robot could better engage the human and serve the purpose of interaction opening. Further, in [38], it has been shown that simple arm-head gesture increases the expressive power of the social robots. Although, such movements have not been directly studied in relation to proactivity, we will hypothesize proactive behaviors, which will incorporate arm and head movements, as an attempt to be expressive. Moreover, a robot moving its arm to a location could induce human goal anticipatory response as demonstrated in children [21]. Further, regarding the object manipulation, which is the focus of our paper in HRI context, we find that gazing plays an important role in pointing-based object-reference conversation [27]. Therefore, in our hypothesized proactive behaviors, we incorporate to look at the object and the place of interest.

In this paper, our focus will be the aspect of estimating ‘where’ a task could be performed by the human and then hypothesize different proactive behaviors in terms of verbal as well as non-verbal key communicative components (arm motion, gaze shifting) as identified above.

3 Contributions of the Paper

Below we summarize the contribution of the paper: conceiving hypotheses related to proactive behaviors; developing a framework to synthesize a solution for proactive actions based on ‘where’ the robot could support the task, while ensuring least feasible human effort; and validating the hypotheses through a preliminary user study.

(i)
Hypothesizing proactive behaviors and expected effect in HRI: For interactive human robot joint tasks, we hypothesize the following:

(a) Proactive Reach out to Take from the Human: We postulate that along with informing verbally, the robot should proactively reach out to take, in the case the human has to give something to it.

(b) Proactively Suggesting ‘Where’ to Place: We postulate that the robot should proactively suggest (verbally and by gaze shifting) about ‘where’ to place an object, in the case the human has to make the object accessible to the robot.

We hypothesize that such proactive behaviors will be preferred over non-proactive behavior and will reduce the ‘confusion’ of the human partner.
(ii)
Framework to instantiate human-adapted proactive actions for least feasible human effort: We will present a generic framework to find a solution for different proactive behaviors. The framework estimates ‘where’ the human can perform the task by respecting environmental and postural constraints and where the robot can support the task. Our framework will provide a human adaptive proactive solution, by taking into account human perspective and effort (Sect. 5.4).

So, we will further hypothesize that such human-adapted proactive behavior will be preferred over non-proactive behavior of the robot as it will also reduce the ‘effort’ of the human, and the robot should be perceived as being more ‘aware’ about the human’s capabilities and more ‘communicative’ about its own capabilities.

We will show through experimental results, the capability of the framework to provide a solution for different scenarios and for different robots with visible reduction in the human effort (Sect. 6.1).
(iii)
User studies to validate the hypotheses: We will present the results of a set of user studies to validate the hypothesis that such proactive behaviors indeed reduce the human ‘confusion’ and the human adapted aspect further reduces the human ‘effort’ and make the robot more ‘aware’ and ‘communicative’ (Sect. 6.2).

In the next section, we will formally explain the nature of the proactive behavior within the scope of the paper and give a general description about how such proactive behaviors are achieved. Then we will present the methodology to instantiate such proactive behaviors in Sect. 5. Here, we will first briefly describe the multi-state visuo-spatial perspective-taking ability of the robot and assignment of the effort levels. Then, in Sect. 5.4, we will present the generic framework, which iteratively finds a feasible solution for proactive behavior. The approach is to find a place ‘where’ the human might perform a task with least feasible effort and among those places ‘where’ the robot could support it, given its own constraints. In Sect. 5.5, we will illustrate the framework to instantiate solutions for two proactive behaviors hypothesized above: (a) proactively reaching out to take for the give task by the human and (b) proactively suggesting ‘where’ to put for the make-accessible task by the human. In the experimental results and analysis, Sect. 6, we will demonstrate two main aspects:

(i)
Sect. 6.1 will demonstrate the capability of the framework to find solution for different scenarios, and for different types of robots. We will also demonstrate the reduction in the human effort in the case of human adapted proactive behavior.
(ii)
Sect. 6.2 will show supporting evidence for our hypotheses with user studies.

Finally, we will conclude the paper with discussions on potential applications and pointer to the future work.

4 Objective and Nature of the Proactive Behaviors of Interest

We define an environment En, which consists of entities: objects and agents, and a set of attributes At related to physical states of all the objects, and physical and mental states as well as abilities of all the agents. The state of the environment consists of the set of tuples 〈entity,attribute,value〉. From the planning point of view, the target state of the environment for a task T is specified as a set of facts $F^{T} = \{f_{i}^{T} \}$, where each fact $f_{i}^{T}$ will consist of tuple 〈entity,attribute,constraint〉. Facts could be directly observable as well as inferred. For a particular task T, the constraint for a particular instance of 〈entity,attribute〉 pair could be in different forms:

(i)
A single value. For example, if an object O should be at (x,y,z), for the entity O and its attribute position, this constraint results into $f_{i}^{T}=\langle O, \mathit{position}, (x,y,z) \rangle$.
(ii)
A set of values because of some desired facts. For example, if an object O should be on the table, this results into $f_{i}^{T}=\langle O, \mathit{position}, \{ p_{j}\} \rangle$, where {p _j} is the set of positions at the table top.
(iii)
A set of values because of some forbidden facts. For ex- ample, if the constraint is that an object O should not be reachable by an agent Ag, this results into the fact $f_{i}^{T}=\langle O, \mathit{position}, \{p_{k}:\mathit{non}\_\mathit{reachable}(O, Ag, p_{k})\} \rangle$.

Hence, for a particular task T, all the constraints for a particular instance 〈ent,at〉 of the 〈entity,attribute〉 pair will eventually result into a space $s_{ent,at}^{T}$, in which its value can lie. Given F ^T, we say the pair 〈ent,at〉 to be properly grounded if $s_{ent,at}^{T}$ contains only one value, i.e. if $s_{ent,at}^{T}$ is a point in the potential space for that 〈ent,at〉 pair. Let SP ^T denotes the space corresponding to all the properly grounded 〈entity,attribute〉 pairs. Given F ^T, we say a 〈ent,at〉 pair to be loosely grounded for the task T, if $s_{ent,at}^{T}$ contains multiple potential values. Let SL ^T denotes the space corresponding to all such loosely grounded 〈entity,attribute〉 pairs. In one sense, SL ^T provides the agent with latitudes to perform the task differently, but on the other hand it can also be the source of confusion because of the burden of deciding among multiple possibilities. And this is where, the type of the proactive actions, which are within the scope of this paper, play its role.

We assume that a proactive action is synthesized based on an initial specification of expected facts F ^T about the task T, the sequence of already planned actions A ^T to perform the task, the environment state En _I before executing A ^T, the predicted state of the environment En _F, if A ^T would be executed. For synthesizing proactive actions, the intention/motive behind such actions is important.

Proactive behaviors could be of various types and could be exhibited in various ways. The focus of this paper is on the type of proactive behavior, which makes changes in the loosely grounded space SL ^T, as discussed above, by instantiating and/or better specifying its parameters and communicating it through the proactive action. The intentions behind such proactive behaviors are: to guide and facilitate the human partner to better perform the task, to achieve the joint goal by reducing the task related confusions and to reduce the effort of the human partner. For example, see Fig. 2, where an ellipse shows one state of the environment, and an edge shows an action. In the absence of any proactive behavior, one of the states of the expected final environment will be En _F. In the case of proactive behavior, an intermediate environment En _Intr is created by the robot’s proactive action $R_{A}^{P}$ by instantiating or better specifying the expected state of the environment. Hence, the proactive behavior leads to a change in any of these: the physical state of the environment, the physical state of the human, the mental state and belief of the human. And the human adapts to it to achieve the task by action $H_{A}^{M}$, by modifying the parameters of his/her planned action. This might result into a partial different final state of the environment, $\mathit{En}_{F}^{\prime}$. In this paper, the hypothesized proactive behaviors do not eliminate the set of already planned actions, A ^T, as well as preserve the already properly specified SP ^T space of the final environment. However, the parameters of A ^T could be adapted. Hence, there is a common part in the final states of the environment, represented as $\mathit{En}_{F}^{sp}$ in Fig. 2.

For the give task by the human, the set of specifications about the facts F ^Give includes: the object should be in the hand of the robot, and the set of actions A ^Give includes that the human will pick and carry the object and the robot will grasp the object. For simplicity the parameters of the task is omitted, such as the performing agent, the agent for whom the task will be performed, the object, etc. For the make accessible task by the human, F ^{Make_Accessible} includes the fact that the object will be on a support, and it will be graspable, reachable and visible to the robot. Whereas, A ^{Make_Accessible} includes that the human will pick, carry and place on a support. Based on these sets of F ^T and A ^T, the two complementary research challenges are, autonomous synthesis of appropriate proactive action $R_{A}^{P}$ and autonomous instantiation of the corresponding intermediate state of the environment, En _Intr. In this paper, we assume that $R_{A}^{P}$ for a task is already known as we have hypothesized them in Sect. 3. And in the next section we will present the framework to instantiate the En _Intr to communicate through hypothesized $R_{A}^{P}$. In our current examples, this instantiation of the unspecified part involves reasoning about the ‘where’ aspect of the task i.e. the placement (either to hold or put) of the object in the final environment.

5 Methodology

5.1 Multi-state Visuo-Spatial Perspective Taking

Reasoning by modeling the agent and its behavior in simulation has been shown to be effective for perspective taking, teamwork and social behavior [31]. Visuo-spatial perspective-taking of the human has already been shown as an important component in Human-Robot Interaction [2, 17]. We have enriched the robot’s ability of perspective taking not only from the current state of the agent but also from a set of different states the agent might attain, and presented the concept of Mightability in [43]. Mightability stands for “might be able to …” and facilitates multi-state visuo-spatial perspective taking. The idea is to analyze various abilities, A _b, of an agent by applying an ordered list of virtual actions, A _v=[a ₁,a ₂,…,a _n], while respecting the environmental and postural constraints. The robot estimates the abilities at 3D grid level and at object level, which we termed as Mightability Maps (MM) and Object Oriented Mightabilities (OOM) respectively. In the current implementation:

(1)

(2)

5.2 Symbolic Categorization of Efforts

We have categorized the effort to attain a state from the current state based on the joints involved. This is motivated from the studies of human movement and behavioral psychology [10, 19], where different types of reach actions of the human have been identified and analyzed. This includes reaches involving simple arm extension (arm-only reach), shoulder extension (arm-and-shoulder reach), leaning forward (arm-and-torso reach) and standing, as shown in Fig. 3. Such categorization could be further enhanced based on the studies of musculoskeletal kinematics and dynamics models [32, 48]. The associated effort level based on the joints involved in applying a particular A _v, are:

(3)

(4)

Table 1 shows this categorization with the relative levels of effort. For example, assume an agent is currently sitting. To see a particular object if the agent will be required to apply A _v=[Make_Standing,Lean_Torso], the corresponding effort level to see the object will be Whole_Body_Effort, as it involves additional joints, instead of just turning the head or the torso. Whereas, Displacement_Effort corresponds to the situations in which the agent has to move from the current position.

Table 1 Effort classes for visuo-spatial abilities

Full size table

5.3 Grasp and Placement Analysis

For behaving proactively, it is important for the robot to compute where the task could be performed. To find the feasibility of a task, the framework not only reasons about the places to perform the task, but also performs rich geometric analyses on the possibilities of grasp and placement orientations of the object. This is important, because there might exist a reachable and visible place P to perform a task, but placing at P is either not feasible by the human or might not allow to grasp the object by the robot. This could be due to various factors, such as the cluttered environment, difference in the shapes of the anthropomorphic hand and the robot’s gripper, etc. Hence, reasoning about existence of collision free grasps while picking or finding a position to place an object is important. This becomes more prominent with the real life objects of complex shapes. Our system is capable of analyzing feasibility of grasps of complex objects for robot’s gripper and for anthropomorphic hand, given an environment [50]. This further enables the robot to extract a set of simultaneous grasp pairs, by the human and the robot to facilitate the tasks requiring object hand-over. Further, an object could be placed at a position P in different orientations, and depending upon the environment some or none of these placements might be stable and/or allow to grasp. Therefore, it is also important to test that from the task perspective, at a particular position there exists any feasible placement orientation or not.

Therefore, to incorporate these feasibility tests, the robot computes and stores the sets of possible grasp and placement orientations assuming a collision free environment. Further, the robot autonomously computes the support planes based on the vertical facets, which could be part of any object such as a table, a box, etc. This enables the robot to find a feasible placement of an object on any other object. During planning for a task, the planner filters out the non-feasible placements and grasps by simulating them in the 3D environment (Move3D [56]) reflecting the current state of the world.

5.4 Framework to Instantiate Proactive Solution

Figure 4 shows the reasoning process for extracting a solution to behave proactively. As shown in block ‘a’ of Fig. 4, input consists of the current environment state En _I, the task T with its parameters: performing agent, target agent (for whom the task would be performed), the name of the object. In addition, the current maximum allowed effort level of the robot, R _MEL and of the human, H _MEL are provided.

5.4.1 Extracting Human-Adapted Candidate Places

The planner reasons about ‘where’ the human can perform the task. As an attempt to ensure minimum feasible effort by the human, initially the maximum allowed effort level of the human, H _EL is set as level 0, i.e. No_Effort_Required (see Table 1) to see and reach the places to perform the task (block ‘b’ of Fig. 4). In block ‘c’ of Fig. 4, the planner will find the candidate places where the human can perform the task T with his current maximum effort level. Depending upon the task T, for a particular agent Ag, the planner is already provided with a set of constraints $C_{T}^{Ag}$ to find the set of candidate places:

$$ C_{T}^{Ag} = \bigl\{c_{i}^{Ag}: i=1, \ldots, m\bigr\} $$

(5)

where m is total number of constraints and $c_{i}^{Ag}$ consists of tuple:

$$ c_i^{Ag}= \bigl \langle \mathit{effort}\_\mathit{level}(E), \mathrm{ability}(A_b)= \mathit{true}|\mathit{false} \bigr\rangle $$

(6)

where effort_level is the element of expressions (3) or (4) depending upon the ability, which is element of expression (1). The desired values as true or false are known for the task a priori. For example, if the task is to give an object, the planner knows that the abilities to see and reach the candidate places by the performing agent should be true for the desired effort level. If the human, Ag=H, has to give or make something accessible to the robot, for the human’s current effort level, which is set as H _EL=0, Eq. (5) results into the set of constraints: $C_{T}^{H} = \{c_{1}^{H},c_{2}^{H}\}$, where, $c_{1}^{H}= \langle0, \mathit{see}=\mathit{true} \rangle$ and $c_{2}^{H}= \langle0, \mathit{reach}=\mathit{true} \rangle$.

Hence, the planner extracts the set of candidate places, $P_{H_{EL}}^{T}$, which satisfies, for the agent human, Ag=H, the set of constraints, $C_{T}^{H}$:

(7)

For extracting $P_{H_{EL}}^{T}$, based on the constrains, the planner performs relevant set operations on the corresponding Mightability Maps (see Sect. 5.1) of the agent. However, there might be the situations where there exists no common reachable cell in the Mightability Maps of two agents, but because of the object size there might exist some places to perform the task, such as the task of handing-over. As the object is known to the planner, to ensure not to lose such places, before finding $P_{H_{EL}}^{T}$, the Mightability Maps are expanded based on the dimension of the object’s bounding box.

Hence, block ‘d’ of Fig. 4 shows $P_{H_{EL}}^{T}$, the set of places where the human might be able to perform the task T for the given effort level, H _EL. For effort level 0, in which the human is not expected to even move the arm, $P_{H_{EL}}^{T}$ will be NOT NULL only if the object is already in the human’s hand. The human has to just maintain his/her posture and the robot will be expected to take the object from his/her hand. In this case, $P_{H_{EL}}^{T}$ will be a set of points corresponding to the object’s bounding box.

In the case of NOT NULL candidate positions (as tested in block ‘e’ of Fig. 4), in block ‘f’ the planner extracts the set of candidate points where the robot can support the task. For this, from the robot perspective, Ag=R, another set of similar constraints, $C_{T}^{R}$, is applied on the candidate places obtained for the human, $P_{H_{EL}}^{T}$. This results into a refined set of candidate places, $P_{H_{EL}}^{T,R_{EL}}$ (block ‘g’ of Fig. 4):

(8)

where n _f is the total number of final candidate places. If $P_{H_{EL}}^{T,R_{EL}}$ is NOT NULL, as tested in block ‘h’ of Fig. 4, the planner calls a subroutine (block ‘i’ of Fig. 4) to perform various tests to find a feasible solution in the candidate space. Figure 5 gives an overview of the processing of block ‘i’, as will be explained in the next sub-section. If there exist a feasible solution, then the test in block ‘j’ will take the control to block ‘m’, which will return the feasible solution.

Note that at any stage of planning if the planner fails to find a candidate place or a feasible solution, and if there is still a possibility of increasing the effort level (block ‘k’), it increases the acceptable effort level of the human in block ‘l’ and begins the next iteration.

5.4.2 Extracting a Feasible Solution for a Particular Effort Level

Figure 5 shows an overview of the iterative process for extracting a feasible solution (block ‘i’ of Fig. 4) by incorporating the constraints from the perspectives of grasp, placement, visibility of the object and planning. The planner takes as input the candidate set of places obtained in block ‘g’ of Fig. 4 and ranks them. Currently we assign weights based on the closeness to the target-object position. This is based on the assumption that the human needs to put less effort in placing or holding the object if he/she has to carry the object for a shorter distance. Another motivation behind such weight assignment is to exhibit goal-directed behavior. In [22], it has been suggested that to be intentional the robot should exhibit goal-directed actions. Hence, this weight assignment also drives the solution to be directed towards the object, which might also inherit the notion of pointing to the desired object.

Assignment of such weights for candidate places for proactive behavior and the studies and frameworks for preferable hand-over positions such as [6, 15, 26, 57] could mutually benefit. For the latter case the possibility of proactive behavior by the human, who will be the receiver, could be incorporated. Whereas, for the former case in which the robot will be the receiver could take into account further aspects related to preferable hand-over positions.

The planner further takes the initial sets of grasp configurations and placement orientations of the object, and filters them based on the task and the environment to extract reduced candidate sets. The candidate grasps are ranked based on the stability [49] and the placement visibility orientations are ranked for a particular position based on the visibility % of the object from the target agent’s perspective [41]. Then the planner iterates on the three candidate lists to perform a series of feasibility test as shown in Fig. 5. The planner selects a tuple consisting of a placement position, a grasp and a placement orientation (based on their ranking, from highest to lowest) and uses dedicated modules for different feasibility analyses: [41] for % visibility testing of an object from an agent’s perspective [20], for planning collision free path. If any of the feasibility tests fails, another candidate tuple is selected. If all the feasibility tests for a particular task are passed then that particular placement position is considered to show the proactive behavior. If the task demands, a smooth trajectory is also generated for the execution, using [4].

5.5 Illustration of the Framework for Different Tasks

In this section, we will illustrate the presented proactive planner to find a human-adapted feasible solution to behave proactively. We will consider two different tasks performed by the human partner: to give an object to the robot and to make accessible an object to the robot.

5.5.1 For Proactively Reaching out

Figure 6 shows the initial scenario in which the human has to give the toy dog, placed on his right, to the robot. The robot will plan to proactively reach out to take. For the current example, as it is a table-top cooperative manipulation scenario, to avoid more expensive motions of the robot, its maximum allowed effort is set as Arm_Effort. This restricts the robot from planning to turn or move its base, it can only move its arm for achieving the current sub-task of getting the object from the human. And the human maximum effort level is provided as Whole_Body_Effort. However, these could be modified online by higher-level decision making or supervisor systems, such as ours [1, 12]. As already explained in Sect. 5.4.1, for the human, initially the least effort level, No_Effort_Required, will be tested for extracting the candidate places to give the object. The planner will get the candidate set of places in block ‘d’ of Fig. 4 as NULL, as the object is not already in the human’s hand. Hence, the control reaches to the block ‘k’ and increases the human’s reach effort level to Arm_Effort in block ‘l’ (we chose to maintain the corresponding effort level to see as No_Effort). This means that the planner estimates the places, which are in the current field of view of the human and where the human could give the object to someone, if he will only stretch out his arm. Figure 7(a) shows the candidate places for giving the object by the human for this level of effort, obtained in block ‘d’ of Fig. 4 in the next iteration. Green, red and yellow points show giving possibilities by right, left and both hands respectively. In block ‘g’ of Fig. 4 with the maximum allowed effort level of the robot, the planner extracts the subset of the places where it can support the human. For our current example, this turned out to be NULL, as there was no point reachable by the robot with its current effort level among the points of Fig. 7(a). Hence, the planner again reaches to the block ‘l’ of Fig. 4 to test for the next effort level of the human, by setting Arm_Torso_Effort to reach and Head_Torso_Effort to see. Figure 7(b) shows the candidate points for giving the object by the human, who is now expected to lean forward and/or turn around while being seated. In this iteration, in block ‘g’ of Fig. 4 the planner finds a set of candidate places, from where the robot could take an object from the human. Figure 7(c) shows these candidate places as green point cloud. The resultant candidate places after the weight assignment as explained in Sect. 5.4.2, have been shown in Fig. 7(d). Blue points have the highest weight in the sense they will be preferred over the red points having lowest weights. The feasible solution corresponds to the first highest weight candidate point, which passes the rest of the grasp, placement, object visibility and trajectory oriented feasibility tests. This feasible solution obtained in block ‘m’ of Fig. 4, has been indicated in Fig. 7(d). At the end, depending upon the task, the planner returns appropriate data for exhibiting proactive behavior. For the current task, it returns the winner feasible place obtained in Fig. 7(d), the corresponding levels of efforts for the human and for the robot, the trajectory to reach the place, and the estimated end configuration of the robot, as shown in Fig. 8.

5.5.2 For Proactively Suggesting ‘Where’ to Place

The robot finds ‘where’ the human can put the object for the robot to take and proactively suggests the human about that place. As mentioned in Sect. 5.3, the robot is able to find the horizontal surfaces based candidate points to place something on those supports. Hence, in block ‘d’ of Fig. 4, the planner finds the places at the top of the box as well, where the human can put the object as shown in Fig. 9(a). Figure 9(b) shows the weighted candidate points to perform the task. Figure 9(c) shows the feasible estimated placement of the object obtained in block ‘m’ from where the robot could take it. Apart from the similar information for proactive reach out task, the proactive planner also provides the symbolic information that the placement is ‘on the box’ based on the reasoning on the inter-object spatial relations. Incorporating other predicates such as left, right, next to, etc. could further enrich the location description while suggesting the place to put.

5.6 Remark on Convergence Time

As the first step, the main focus of this paper is to incorporate the key elements of grasp, visibility, placement, feasibility of trajectory, etc. from the perspective of both the agents: the robot and the human. One part of our future work is to further optimize the iterative approach presented in Fig. 5. Therefore, we will provide an approximate idea about the convergence time.

The candidate search space based on Mightability Maps could be updated online [43] and the initial lists of grasp and placement are calculated only once for each new object and stored. Hence, the convergence time for the algorithm mainly depends upon the number of times it has to backtrack due to failure of any of the tests in Fig. 5 and the time taken by the path planner, which is presently a RRT based planner [20]. The algorithm finds a feasible solution for the typical scenario, shown in Fig. 6, in 1.6 seconds. The convergence times for other scenarios and tasks presented throughout the paper varies between 1 to 4 seconds.

6 Experimental Results and Analysis

We have tested our system on two different robots: JIDO, a home-built mobile manipulator equipped with a LWR Kuka arm and PR2 from Willow Garage. The robots use Move3D [56], an integrated planning and visualization platform. The robots, through various sensors, maintain and update the 3D world state in real time. For object identification and localization, a tags based stereovision system is used. For localizing the human and tracking the whole body, data from Kinect (Microsoft) sensor is used. The human’s gaze is simplified to his/her head orientation, estimated through markers tracked by a motion capture system in real time. We are using Kinect to track the whole body of the human but from this we do not get precise head orientation of the human. Therefore, we use a motion capture system to track the human head. We avoid tracking the whole body through the motion capture system. This is because it is marker based system and needs a precise model and frequent calibration for each individual. We have created a fixed model of goggles for the motion capture system, which can be used by any human, to get his/her head orientation.

In all the experiments the speech of the robot was scripted and only some of the parameters were synthesized, such as the name of the object and the support (an object or a piece of furniture) provided by the proactive planner. The entire experiment runs on the real robots. However, throughout the paper at appropriate places we have put the pictures of the corresponding 3D environment to show some intermediate steps of the planning.

In [24], it has been shown that simple changes in the robot’s gaze could show robot’s attention and intentions to the human partner. Further, in [28], it has been found that the robot’s eye contact and hand movement with situated dialog help in achieving joint attention with the human partner. Therefore, we have also incorporated such attentional behaviors in a scripted manner, while exposing the robot to interact with the users.

The experiments are controlled in the sense, when the user sits comfortably only then the remote operator starts the script. The experiment begins by the robot saying, “I need your help …”. If the human looks at the robot (detected by the robot through visual perspective taking of the human), it assumes that the joint attention has been established; otherwise it continues to repeat the sentence. Once the joint attention has been established, the robot looks at the desired object then at the human while exhibiting or not exhibiting the proactive behavior. The task and its parameters are already provided to the script.

6.1 Demonstration of the Proactive Planner

This sub-section will be confined to demonstrate two aspects:

(i)
The planner is generic and independent of the scenario and the robot.
(ii)
The resultant solution visibly reduces the human efforts in different situations, when analyzed through the perspective of the effort levels presented in Table 1. Then in the next sub-section, Sect. 6.2, we will show and analyze the results of the preliminary user studies. That will provide supportive and encouraging evidence of the proactive behaviors hypothesized in this paper.

6.1.1 Proactively Reaching out

Figure 10(a) shows an initial scenario in which the robot requests the human to give the object indicated by the red arrow. Figure 10(b) shows the final scenario, where the human is giving an object to the robot for the case when the robot did not move its hand proactively. The human is standing and trying to give the object, hence putting Whole_Body_Effort (see Table 1). Whereas, in the case when the robot was allowed to behave proactively, the proactive planner successfully finds a feasible place to take the object from the human, while ensuring minimum feasible effort by the human. Figure 10(c) shows the case in which the robot is proactively reaching to the feasible place to take the object. This proactive behavior has reduced the human’s effort for the task, as the human is just leaning forward from the seated position to give the object. Hence, the effort is Arm_Torso_Effort instead of the Whole_Body_Effort of Fig. 10(b). Figure 11 shows another scenario where the human and the robot are sitting in a different spatial arrangement. Figure 11(a) shows initial scenario and the position of the object to be given by the human. Figure 11(b) shows the situation of non-proactive behavior, the human is standing and giving the object to the robot. Nevertheless, as shown in Fig. 11(c), the proactive planner finds a suitable human adapted reach out place, which reduced the human effort to Arm_Effort.

We have further tested our system on another robot PR2, to illustrate the portability of the system and the ability of the proactive planner to take into account different robots of different kinematic structures.

Figure 12(a) shows the user giving the object without the robot’s proactive reach behavior, whereas in Fig. 12(b) the user is giving the object with less effort when the robot has proactively moved its arm. Hence, the human effort has been reduced from Arm_Torso_Effort to Arm_Effort. Figures 12(c) and (d) show two different scenarios and the planner is able to find a feasible reach out solution for the PR2 robot. Both the users are giving the object with Arm_Effort in the case of proactive reach out by the robot.

6.1.2 Proactively Suggesting the Place to Put

In this section, we will show the results for the make-accessible task by the human.

Figure 13 shows the initial scenario and its real time 3D representation. In the absence of any proactive suggestion, the human is putting the object close to the robot on the table to make it accessible, Fig. 14(a). This required Arm_Torso_Effort. Whereas, in the case when the robot proactively suggested the human-adapted feasible placement, the human is putting the object on the box. Hence, the human effort has been reduced to Arm_Effort. In Fig. 15, the human is sitting relatively away from the table compared to the scenario of Fig. 14. In this scenario, in the absence of proactive suggestion from the robot, the human is standing up and leaning forward to make the object accessible to the robot, i.e. with Whole_Body_Effort, Fig. 15(a). However, as shown in Fig. 15(b) with proactive suggestion to place, the human has to just lean forward, which is Arm_Torso_Effort. Note that in this case, the robot with its current allowed maximum effort level, which is set as Arm_Effort, was not able to support the human for his Arm_Effort level, as was the case for scenario of Fig. 14.

Hence, the presented planner is not only able to find a feasible proactive solution for different scenarios for both the tasks, but also it is able to qualitatively reduce the effort of the human partner.

6.2 Validation of Hypotheses Through User Studies

We have performed a series of preliminary user studies to validate the hypotheses and discover the effects of the proactive behaviors of the robot on the users compared to the non-proactive behaviors. In fact, the figures shown in previous sections are from that user study. The two main aspects we want to validate are:

(i)
Whether the users are experiencing the reduction in confusion about the task because of the hypothesized expressive proactive behaviors or not.
(ii)
As the presented framework takes into account the human partner’s visuo-spatial perspective and effort to find a solution not only to behave proactively but also to reduce the human effort. Therefore, we further want to validate whether the users are experiencing the reduction in effort or not. Also we want to know that in the case of such human-adapted proactive behaviors, whether the users find the robot to be ‘aware’ and ‘supportive’ to their capabilities or not.

There were a total of 30 users divided into three groups of 10 users, two groups for the give task and one group for the make accessible task. Each user group was a mix of different users based on their exposure to the real robots: no exposure, little exposure, and rich exposure. This was to compensate any bias from the experienced and non-experienced users of robots in general. At the beginning of the experiment, each user was informed that the robot will interact but not about the behaviors and the task. Further, no particular instruction was given to the users about ‘how’ they should behave.

6.2.1 Users’ Responses for the “Give” Task

We setup different scenarios having different relative positions of the robot, the human and the objects. Broadly, the scenarios could be divided into two categories:

(i)
The human is sitting away from the robot and there is some furniture between them, similar to Fig. 10(a).
(ii)
The human is sitting relatively closer to the robot with different relative position and there is no furniture between them, similar to Fig. 11(a). The users were randomly selected to sit in one or the other scenario.

There were two user groups for the give task: group I and group II consisting of 10 users in each group. The main difference between the two groups was that they have been exposed to the robots of different appearances: JIDO and PR2. This was to compensate any bias due to the robot’s appearance or kinematic structure while validating our hypotheses.

Each user has been exposed to two different behavior of the robot: NPB and PB. NPB (Non-Proactive Behavior): The robot just asks to the user “Please give me the ≪object_name≫” and waits in its current state. PB (Proactive Behavior): The robot asks the same but also starts moving its arm along the trajectory obtained through the presented proactive planner. In the PB case, it also starts turning its head to look at the object as an attempt to incorporate goal-object-directed gaze movement (head movement in our case) as discussed in Sect. 4.

During the entire experiment, the decision whether PB or NPB should be exhibited first to a particular user was random. After being demonstrated to both behaviors, each user was requested to fill a questionnaire with first behavior referred as B1 and the second behavior as B2. Note that for some of the users B1 was NPB and for some it was PB.

Below we will first analyze the common part of the questionnaire of group I and group II, to show that independent of the appearance of the robots, the proactive reach behavior is preferable over the non-proactive behavior. Then we will present the analyses of the part of the questionnaire, which is exclusive to group I and explore the nature of the confusion and the effect on the effort. (We excluded these questions for group II users for compactness, as they were required to answer about two additional behaviors. Those behaviors are related to the ‘when’ aspect of the proactivity and are beyond the scope of this paper.)

Table 2 shows that in the case of proactive reach out behavior of the robot, the total number of the users having at least one type of confusion has been significantly reduced. This supports the hypothesis that the proactive reaching out to take something reduces the confusion of the user.

Table 2 Type of users’ confusions for the give task

Full size table

Note that the sum total (%) of the data of these tables and of the tables following may not be 100 as the users were allowed to mark multiple options or none.

Table 3 shows the users’ confusions, reported by group I users, about how to perform the task. It shows the data for two different cases: (i) NPB-PB: When the non-proactive behavior (NPB) has been shown first followed by the proactive behavior (PB). (ii) PB-NPB: When PB has been exhibited first followed by the NPB. The percentage (%) is calculated based on the total number of the users belonging to a particular case (i) or (ii). Note that for the case (ii) in which PB has been demonstrated first, users have been found to be biased towards expecting similar behavior for the next demonstration, which was going to be NPB. Last column of Table 3 reflects this as more users are expecting the robot to show some activity when PB has been exhibited first. In such cases user responses were, “I thought that the experiment has failed, since the robot didn’t move”, “I was waiting for the robot to take it from me.”

Table 3 Users’ responses about the confusion on ‘how’ to perform the give task in the NPB of the robot

Full size table

Table 4 shows group I users’ responses about the change in their effort. It shows that 71 % users of the NPB-PB case explicitly mentioned that the second behavior, i.e. the PB has reduced their effort to give the object compared to the first behavior, i.e. the NPB. Further, 66 % users of the PB-NPB case explicitly mentioned that the second behavior, i.e. the NPB has demanded more effort to give the object compared to the first behavior, i.e. the PB. On combining both, a majority of the users, 70 % of the total users of group I, reported that the proactive reach out behavior of the robot reduces their efforts compared to non-proactive behavior. Hence, it supports our hypothesis that the human adapted reach out will also make the users to feel a reduction in their efforts in the joint tasks. It also validates that the presented framework is indeed able to find a solution while maintaining least feasible effort of the human partner.

Table 4 Users’ experience on change in effort for the give task

Full size table

Table 5 (combines group I and group II responses) shows that a majority of the users reported the robots to be more ‘aware’ and ‘supportive’ to them and to the task in the cases it behaved proactively. Table 5 also shows that 80 % of users of group I explicitly mentioned that proactive reach behavior guides them about where to perform the task. Hence, validating the perspective taking capability of the robot.

Table 5 Users’ experience about awareness, supportiveness and the guiding nature of PB for the give task

Full size table

6.2.2 Discussion on a Few Observations for the “Give” Task

Apart from the direct responses from the users, we observed following situations, which point towards the need of further exploration:

(i)
Without any proactive reaching behavior, the user in Fig. 16(a) is holding the object and waiting for the robot to take. Whereas, as shown in Fig. 16(b), in the presence of proactive reaching behavior of the robot, the human is also putting some effort to lean and give the object to the robot. This seems to be validating the studies of human-behavioral psychology that goal anticipation during action observation is influenced by synonymous action capabilities [21].
Fig. 16
Task of giving an object to the robot. (a) In the absence of any proactive behavior the user is holding the object and waiting for the robot to take. (b) With proactive reach behavior from the robot, the user is also putting some effort to give the object to the robot
Full size image
(ii)
For the cases where non-proactive behaviors have been shown first, few users have been found to spend some time ‘searching’ for the object to give, if the table top environment was somewhat cluttered. This might be suggesting to incorporate the component of pointing, by adapting a goal-directed approach to fetch the user’s attention to the object of interest. In our experiments, this has been partially achieved by assigning higher weights to the places closer to the object. This seems to be supporting the findings in [40] and [11] that the directing-to gesture helps in drawing user’s focus of attention towards the object.

Further user studies are required to properly validate and establish these observations as facts.

6.2.3 Users’ Responses for the “Make Accessible” Task

The robot requests the human partner to make an object accessible. We have deliberately built the scenario in which the least feasible effort for making an object accessible to the robot is to put it on the top of a white box.

There were 10 users forming the group III. For this task, instead of exposing the two behaviors randomly to a user, we decided to first show the non-proactive behavior (NPB) followed by the proactive behavior (PB). This is because if the user will be first exposed to the PB, he/she might be biased towards putting the object at the same place in the case of NPB also, as the scenario would be the same.

For the non-proactive behavior (NPB), the robot looks at the human and utters the scripted sentence:

“Hey, I need your help. Can you please make the ≪object_name≫ accessible to me.”

For the proactive behavior, (PB), the robot says:

“Hey can you make the ≪object_name≫ accessible to me, you can put it on the ≪support_name≫”.

As an attempt to incorporate the goal-directed gaze movement (head movement in this case) of the robot, it looks at the object while uttering the first part and then it starts turning its head towards the place where it would suggest the human to put the object.

As shown in Table 6, about 80 % of users have reported confusion about how and where to make the object accessible in the case of NPB. This has been significantly reduced to 30 % in the case of PB.

Table 6 Nature of the users’ confusions for the make-accessible task

Full size table

Table 7 shows the percentage of users who were suspicious about the robot’s ability about from ‘where’ it could take or see the object. Note that in the case of proactive behavior, as the robot was explicitly suggesting, “… you could put it on the white box”, hence restricting the search space for the user to perform the task, such suspicions have reduced significantly.

Table 7 Users’ suspicions about the robot’s capabilities for the make accessible task

Full size table

These findings seem to be also supporting the result of [40], which shows that the use of location description increases accuracy in finding the target. In the current experiment, the location description was not for localizing the object, but instead for the place to put the object; hence guiding the user for efficient task realization.

As shown in Table 8, a majority of the users found the proactive suggestion by the robot more compelling. Table 9 shows that 60 % of the users found that the human adapted proactive behavior reduced their efforts.

Table 8 Users’ responses about the robot’s awareness through the PB for the make accessible task

Full size table

Table 9 Users’ responses about their relative efforts in the make accessible task

Full size table

6.2.4 Discussion on a Few Observations for the “Make Accessible” Task

(i)
One interesting observation was related to the human’s interpretation about how to perform the task of making an object accessible. As shown in Fig. 17(a), in the case of non-proactive behavior, the user took the white box away for making the object (pointed by the red arrow) accessible to the robot. Although he overestimated the reach of the robot, his justification was, “… I thought if I would move the obstructing box away, the robot would be able to take the object …” Fig. 18 shows another scenario in which the user is holding the object close to the robot and waiting for it to take. Such observations indicate the need of a situation based proactively suggestion also on the ‘how’ aspects of the task.
Fig. 17
Task of making an object (pointed by the red arrow) accessible to the robot. In the absence of proactive behavior, the user took away the white box as an attempt to clear the obstruction for the robot to take the object (Color figure online)
Full size image

Fig. 18
Task of making an object accessible to the robot. In the absence of proactive behavior, the user is holding the object and waiting for the robot to take
Full size image
(ii)
Figure 19 shows the user is confused about ‘which’ object the robot has requested. Such confusion has been reported by at least 3 users because of various factors such as background noise, difficulty to ground the object by name, being a novice to the computer-synthesized sound, etc. Moreover, such confusion has been reported in both the cases: non-proactive and proactive. In this particular case the user is trying to reach towards the objects on his left side based on predicting the robot’s attention, Fig. 19(b), but looking at the robot to get some additional information, Fig. 19(c).
Fig. 19
Task of making an object (marked as red arrow) accessible to the robot. In the absence of further feedback from the robot, the human is confused about which object to make accessible, as he failed to ground the object referred by the robot (Color figure online)
Full size image

This suggests that whenever required the element of pointing should be also included in the robot’s behaviors. Another component suggested by Fig. 19(c) is to have a feedback mechanism from the robot. Not only does the robot require feedback from the human but also the robot should provide feedback to the human in a natural human-robot interaction scenario. Works on such complementary issues of grounding references through interaction, such as ours [37, 46], could be adapted for this purpose of proactive behavior with feedback.

6.2.5 Overall Inter-task Observations

In this section, we will combine the results of both the tasks to draw some global conclusions. Table 10 (by combining Table 2 and Table 6) shows an overall 66 % reduction in confusion in the case of proactive behavior. Table 11 shows that a majority of the users, 65 %, experienced that the human adapted proactive behavior reduced their efforts. Table 12 shows that a majority of the users, 85 %, reported that the proactive behavior has better communicated the robot’s capabilities and was more supportive to the task and to them.

Table 10 Overall reduction in the users’ confusion because of the robot’s proactive behavior

Full size table

Table 11 Overall reduction in users’ effort because of the robot’s proactive behavior

Full size table

Table 12 Overall responses about supportiveness and communicativeness of the proactive behavior

Full size table

7 Discussion and Potential Applications

The main reason the presented system has been preferred is, the robot not only requests to give or make accessible but also proactively provides a human-adapted solution to the user through different means. The same framework could be used in the case where instead of the robot, the human will announce the task that he/she is about to give or make an object accessible to the robot. In that case, the robot could proactively find a solution and communicate it with similar behaviors, such as: “OK I will take …” and reaches out to take or “OK put it on the box, I can take it from there …”.

There could be a range of real-world day-to-day interaction situations in which the robot could not reach an object at all, or need to put significant effort and/or time to get the object by itself. In such situations, the robot could request for the help of the human partner. In that case, the presented system could be used to improve the sociability and acceptance of the robot, allowing the robot to do more than just asking for the help, to cooperate proactively. One such example scenario could be a domestic robot cleaning the dinner table. To perform the task efficiently (e.g. to minimize the time), a high-level planner such as [1] could plan some cooperative actions. Such as, the robot could ask the human to give or make accessible a far-reaching object, so that the robot will not be required to take a tour around the table. However, to be polite and supportive, the robot could proactively stretch out its hand while asking for the human’s help. Another example could be to support an elderly person or someone who is having back or neck problem or reduced mobility. The presented framework could take into account such constraints by appropriately restricting the maximum allowed effort level of the human partner, e.g. restricting it to Arm_Torso_Effort and then could find a solution to behave in a proactive manner. Suppose the robot has served a glass of water to take the medicines and now the person is about to finish taking the medicines. Whether it is to take the glass back from the human or to suggest a place to put it so that later on the robot could take it, the presented system could be used to reduce the person’s effort and confusion. Similarly if a robot teammate is fixing a frame in an assembly unit therefore cannot move away, it can ask for a tool from the human partner, while showing proactive reach out behavior with its maximum feasible effort level in that situation.

Moreover, it is not necessary that the robot will always be expected to reduce the effort of the human partner. The presented framework can be adapted to find the solutions, which could balance mutual-effort or could even reduce the robot’s effort. This could be achieved by regulating the effort levels of both the agents in the iterations of the presented framework.

In [42], it has been shown that the robot could express and communicate its intention of type “I want to do this but I can’t …” to the human by repeating the simple actions corresponding to the failing task. The same framework could be used to find a solution for such communicative actions. For example, the robot could communicate its intention that it wants to take an object but could not reach by trying to reach out in a human-adapted manner.

The presented framework, which enables to estimate ‘where’ the human can perform the tasks with different levels of effort, could also be used to estimate the places to provide a solution for other proactive behaviors in basic human-robot interactive tasks. We are in the process of hypothesizing and realizing such proactive object manipulation behaviors:

(a)
Proactively put an object closer to the human, for example, putting the sugar container closer to the human preparing the coffee, to reduce his effort.
(b)
Proactively put away the object, which the human could hit obliviously during performing some task.
(c)
Proactively take an object away to make space for the human to put another object, to facilitate achieving a task.
(d)
Proactively take and show some object, which the human might be looking for, etc.

The presented framework could be also adapted to behave proactively by estimating where the human could exhibit a competitive behavior. Figure 20 shows the candidate places where the human with Arm_Torso_Effort could hide some object from the robot. In the context of multi-agent object hide-and-seek like games, the robot could use such information to proactively communicate or hint its partner about the potential places to hide or seek some object. In fact, various types of higher level proactive behaviors could be realized by exploiting the fact, ‘where’ an agent could perform a task, either to cooperate or to compete, with a particular effort level.

8 Conclusion and Future Work

The contribution of this paper is two-fold: Hypothesizing and validating proactive behaviors for two basic human-robot interactive manipulation tasks and a generalized planner, which could find a solution for such tasks while respecting environmental, postural and effort oriented constraints. The focus is to utilize the ‘where’ information based on multi-state visuo-spatial perspective taking. Our research is inspired by studies of human behavioral psychology, which suggests that estimation of ‘where’ is an important aspect for co-ordination.

We have shown through users’ responses that our hypotheses hold and the users find the robot to be more ‘aware’ and ‘supportive’. Further, through experimental results and the users’ responses we have shown that the presented proactive planner is not only finding the solution for basic tasks for different robots in different scenarios, but also successfully achieving its goal of reducing the human’s ‘effort’ and ‘confusion’. In the presented framework, our robots observe the human, analyze situation-dependent capabilities, perform various task-dependent feasibility tests for motion planning and find feasible solutions for different proactive behaviors. Assignments of various weights in finding the feasible solution or in various decision-making processes are parameters to the system and could be adapted/refined even online based on user activity, response and feedback.

As the robot could perform the presented multi-state perspective taking for multiple humans/agents, an interesting future work is to synthesize and instantiate proactive behaviors by taking into account more than two agents’ cooperation for a task. Another interesting work in progress is to use the information learnt/understood by the robot about what does the task ‘mean’ and ‘how’ a task could be performed [9, 44, 45]. This will help to explicitly provide proactive suggestions about ‘how’ to perform a task as well as to autonomously synthesize a proactive behavior/action. It would be also interesting to analyze the effect: would the user feel too constrained by the robot or would it better help the user to perform the cooperative task? What should be the optimal level of abstraction of such suggestions?

The hypothesized proactive behaviors in this paper are some of the essential building blocks of basic actions for complex socio-cognitive behavior including expectation [39] and intention. We have performed a preliminary level of user studies and found the results encouraging and supporting our intuition and hypotheses. We feel the need of further user studies from the perspective of long-term human-robot interaction in the context of high-level tasks. Regarding this, proactive gaze has been suggested as an important aspect to be incorporated in developing methods to measure HRI through motor resonance [54]. This could be adapted to develop the measure of proactivity in HRI, based on how much the proactive action of the robot induces proactive gaze of the human partner, indicating the predictiveness in the proactive behavior. This will also help in identifying the necessary enhancements at different levels of planning and execution of such proactive behaviors.

References

Alili S, Alami R, Montreuil V (2009) A task planner for an autonomous social robot. In: Asama H, Kurokawa H, Ota J, Sekiyama K (eds) Distributed autonomous robotic systems, vol 8. Springer, Berlin, pp 335–344
Google Scholar
Berlin M, Gray J, Thomaz AI, Breazeal C (2006) Perspective taking: an organizing principle for learning in human-robot interaction. In: Proceedings of the 21st national conference on artificial intelligence (AAAI’06), vol 2. AAAI Press, Menlo Park, pp 1444–1450
Google Scholar
Breazeal C (2009) Role of expressive behaviour for robots that learn from people. Phil Trans R Soc B, Biol Sci 364:3527–3538
Article Google Scholar
Broquère X, Sidobre D, Nguyen K (2010) From motion planning to trajectory control with bounded jerk for service manipulator robots. In: IEEE international conference on robotics and automation, pp 4505–4510
Google Scholar
Buss M, Carton D, Gonsior B, Kuehnlenz K, Landsiedel C, Mitsou N, de Nijs R, Zlotowski J, Sosnowski S, Strasser E, Tscheligi M, Weiss A, Wollherr D (2011) Towards proactive human-robot interaction in human environments. In: 2nd international conference on cognitive infocommunications (CogInfoCom), 2011, pp 1–6
Google Scholar
Cakmak M, Srinivasa S, Lee MK, Forlizzi J, Kiesler S (2011) Human preferences for robot-human hand-over configurations. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), 2011, pp 1986–1993
Google Scholar
Carlson T, Demiris Y (2008) Human-wheelchair collaboration through prediction of intention and adaptive assistance. In: IEEE international conference on robotics and automation (ICRA), pp 3926–3931
Google Scholar
Cesta A, Cortellessa G, Pecora F, Rasconi R (2007) Supporting interaction in the robocare intelligent assistive environment. In: Proceedings of AAAI Spring symposium on interaction challenges for intelligent assistants. AAAI Press, Menlo Park, pp 18–25
Google Scholar
Chella A, Dindo H, Infantino I (2006) A cognitive framework for imitation learning. Robot Auton Syst 54(5):403–408
Article Google Scholar
Choi HJ, Mark LS (2004) Scaling affordances for human reach actions. Hum Mov Sci 23(6):785–806
Article Google Scholar
Clark HH (2003) Pointing and placing. In: Kita S (ed) Pointing: where language, culture, and cognition meet. Erlbaum, Hillsdale, pp 243–268
Google Scholar
Clodic A, Cao H, Alili S, Montreuil V, Alami R, Chatila R (2009) Shary: a supervision system adapted to human-robot interaction. In: Khatib O, Kumar V, Pappas G (eds) Experimental robotics. Springer tracts in advanced robotics, vol 54. Springer, Berlin, pp 229–238
Chapter Google Scholar
Cramer HS, Kemper NA, Amin A, Evers V (2009) The effects of robot touch and proactive behaviour on perceptions of human-robot interactions. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction (HRI), pp 275–276
Chapter Google Scholar
Dautenhahn K (2007) Socially intelligent robots: dimensions of human–robot interaction. Philos Trans R Soc Lond B, Biol Sci 362(1480):679–704
Article Google Scholar
Dehais F, Sisbot EA, Alami R, Causse M (2011) Physiological and subjective evaluation of a human-robot object hand-over task. Appl Ergon 42(6):785–792
Article Google Scholar
Duong T, Bui H, Phung D, Venkatesh S (2005) Activity recognition and abnormality detection with the switching hidden semi-Markov model. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 838–845
Google Scholar
Eimler SC, Krämer NC, Patten AMVD (2010) Prerequisites for human-agent- and human-robot interaction: towards an integrated theory. In: Trappl R (ed) European meetings on cybernetics and systems research (EMCSR)
Google Scholar
Flanagan JR, Johansson RS (2003) Action plans used in action observation. Nature 6950:769–771. doi:10.1038/nature01861
Article Google Scholar
Gardner DL, Mark LS, Ward JA, Edkins H (2001) How do task characteristics affect the transitions between seated and standing reaches? Ecol Psychol 13(4):245–274
Article Google Scholar
Gharbi M, Cortes J, Simeon T (2008) A sampling-based path planner for dual-arm manipulation. In: IEEE/ASME international conference on advanced intelligent mechatronics, pp 383–388
Google Scholar
Gredeback G, Kochukhova O (2010) Goal anticipation during action observation is influenced by synonymous action capabilities, a puzzling developmental study. Exp Brain Res 202:493–497
Article Google Scholar
Kozima HY (2001) In search of otogenetic prerequisites for embodied social intelligence. In: Proceedings of the workshop on emergence and development on embodied cognition (EDEC-2001); international conference on cognitive science (ICCS-2001), pp 30–34
Google Scholar
Hoffman G (2010) Anticipation in human-robot interaction. In: Proceedings of AAAI Spring symposium
Google Scholar
Holthaus P, Lutkebohle I, Hanheide M, Wachsmuth S (2010) Can I help you? In: Ge S, Li H, Cabibihan JJ, Tan Y (eds) Social robotics. Lecture notes in computer science, vol 6414. Springer, Berlin, pp 325–334
Chapter Google Scholar
Holthaus P, Pitsch K, Wachsmuth S (2011) How can I help? Int J Soc Robot 3:383–393. doi:10.1007/s12369-011-0108-9
Article Google Scholar
Huber M, Knoll A, Brandt T, Glasauer S (2009) Handing over a cube. In: Basic and clinical aspects of vertigo and dizziness. Annals of the New York Academy of Sciences, vol 1164, pp 380–382
Google Scholar
Iio T, Shiomi M, Shinozawa K, Akimoto T, Shimohara K, Hagita N (2011) Investigating entrainment of people’s pointing gestures by robot’s gestures using a WOz method. Int J Soc Robot 3:405–414. doi:10.1007/s12369-011-0112-0
Article Google Scholar
Imai M, Ono T, Ishiguro H (2003) Physical relation and expression: joint attention for human-robot interaction. IEEE Trans Ind Electron 50(4):636–643
Article Google Scholar
Jordan JS, Hunsinger M (2008) Learned patterns of action effect anticipation contribute to the spatial displacement of continuously moving stimuli. Journal of Experimental Psychology: Human Perception and Performance, 113–124
Kemp C, Edsinger A, Torres-Jara E (2007) Challenges for robot manipulation in human environments. IEEE Robot Autom Mag 14(1):20–29
Article Google Scholar
Kennedy WG, Bugajska MD, Harrison AM, Trafton JG (2009) “like-me” simulation as an effective and cognitively plausible basis for social robotics. Int J Soc Robot 1(2):181–194
Article Google Scholar
Khatib O, Demircan E, Sapio VD, Sentis L, Besier T, Delp S (2009) Robotics-based synthesis of human motion. J Physiol 103:211–219
Google Scholar
Kockler H, Scheef L, Tepest R, David N, Bewernick B, Newen A, Schild H, May M, Vogeley K (2010) Visuospatial perspective taking in a dynamic environment: perceiving moving objects from a first-person-perspective induces a disposition to act. Conscious Cogn 19(3):690–701
Article Google Scholar
Kwon WY, Suh IH (2010) Probabilistic temporal prediction for proactive action selection. In: IROS workshop on probabilistic graphical models in robotics, GraphBot
Google Scholar
Kwon WY, Suh IH (2011) Towards proactive assistant robots for human assembly tasks. In: Proceedings of the 6th international conference on human-robot interaction, HRI’11. ACM, New York, pp 175–176
Google Scholar
L’Abbate M (2007) Modelling proactive behaviour of conversational interfaces. Ph.D. thesis, TU Darmstadt, Darmstadt
Lemaignan S, Ros R, Sisbot E, Alami R, Beetz M (2012) Grounding the interaction: anchoring situated discourse in everyday human-robot interaction. Int J Soc Robot 4:181–199
Article Google Scholar
Li J, Chignell M (2011) Communication of emotion in social robots through simple head and arm movements. Int J Soc Robot 3:125–142. doi:10.1007/s12369-010-0071-x
Article Google Scholar
Lohse M (2010) Investigating the influence of situations and expectations on user behavior-empirical analyses in human-robot interaction. Ph.D. thesis, Universität Bielefeld
Louwerse M, Bangerter A (2005) Focusing attention with deictic gestures and linguistic expressions. In: Proceedings of the 27th annual meeting of the cognitive science society
Google Scholar
Marin-Urias L, Sisbot E, Pandey A, Tadakuma R, Alami R (2009) Towards shared attention through geometric reasoning for human robot interaction. In: 9th IEEE-RAS international conference on humanoid robots (Humanoids), pp 331–336
Google Scholar
Nicolescu M, Mataric M (2001) Learning and interacting in human-robot domains. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):419–430
Article Google Scholar
Pandey A, Alami R (2010) Mightability maps: a perceptual level decisional framework for co-operative and competitive human-robot interaction. In: IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 5842–5848
Google Scholar
Pandey AK, Alami R (2011) Towards task understanding through multi-state visuo-spatial perspective taking for human-robot interaction. In: IJCAI workshop on agents learning interactively from human teachers (ALIHT-IJCAI)
Google Scholar
Pardowitz M, Dillmann R (2007) Towards life-long learning in household robots: the piagetian approach. In: IEEE 6th international conference on development and learning (ICDL), pp 88–93
Google Scholar
Ros R, Lemaignan S, Sisbot E, Alami R, Steinwender J, Hamann K, Warneken F (2010) Which one? Grounding the referent based on efficient human-robot interaction. In: IEEE RO-MAN, 2010, pp 570–575
Google Scholar
Salichs M, Barber R, Khamis A, Malfaz M, Gorostiza J, Pacheco R, Rivas R, Corrales A, Delgado E, Garcia D (2006) Maggie: a robotic platform for human-robot social interaction. In: IEEE conference on robotics, automation and mechatronics, pp 1–7
Google Scholar
Sapio V, Warren J, Khatib O (2006) Predicting reaching postures using a kinematically constrained shoulder model. In: Lennarcic J, Roth B (eds) Advances in robot kinematics. Springer, Dordrecht, pp 209–218
Chapter Google Scholar
Saut JP, Sahbani A, Perdereau V (2011) A generic motion planner for robot multi-fingered manipulation. Adv Robot 25(1–2):23–46
Article Google Scholar
Saut JP, Sidobre D (2012) Efficient models for grasp planning with a multi-fingered hand. Robot Auton Syst 60(3):347–357
Article Google Scholar
Scassellati B (2002) Theory of mind for a humanoid robot. Auton Robots 12:13–24
Article MATH Google Scholar
Schmid A, Weede O, Worn H (2007) Proactive robot task selection given a human intention estimate. In: 16th IEEE international symposium on robot and human interactive communication (RO-MAN), pp 726–731
Google Scholar
Schrempf O, Hanebeck U, Schmid A, Worn H (2005) A novel approach to proactive human-robot cooperation. In: IEEE international symposium on robot and human interactive communication (Ro-MAN), pp 555–560
Google Scholar
Sciutti A, Bisio A, Nori F, Metta G, Fadiga L, Pozzo T, Sandini G (2012) Measuring human-robot interaction through motor resonance. Int J Soc Robot 4:223–234. doi:10.1007/s12369-012-0143-1
Article Google Scholar
Sebanz N, Knoblich G (2009) Prediction in joint action: what, when, and where. Top Cogn Sci 1(2):353–367
Article Google Scholar
Simeon T, Laumond PJ, Lamiraux F (2001) Move3d: a generic platform for path planning. In: 4th international symposium on assembly and task planning, pp 25–30
Google Scholar
Sisbot EA, Alami R (2012) A human-aware manipulation planner. IEEE Trans Robot 28(5):1045–1057
Article Google Scholar
Taha T, Miro JV, Dissanayake G (2008) Pomdp-based long-term user intention prediction for wheelchair navigation. In: ICRA, pp 3920–3925
Google Scholar
Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of cultural cognition. Behav Brain Sci 28(5):675–691
Google Scholar
Zwickel J (2009) Agency attribution and visuospatial perspective taking. Psychon Bull Rev 16:1089–1093
Article Google Scholar

Download references

Author information

Authors and Affiliations

LAAS, CNRS, 7 avenue du colonel Roche, BP 54200, 31031, Toulouse, France
Amit Kumar Pandey, Muhammad Ali & Rachid Alami
LAAS, Univ de Toulouse, 31400, Toulouse, France
Amit Kumar Pandey, Muhammad Ali & Rachid Alami

Authors

Amit Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Ali
View author publications
You can also search for this author in PubMed Google Scholar
Rachid Alami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Kumar Pandey.

Additional information

This work has been conducted within the EU SAPHARI project (http://www.saphari.eu/) funded by the E.C. Division FP7-IST under Contract ICT-287513.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pandey, A.K., Ali, M. & Alami, R. Towards a Task-Aware Proactive Sociable Robot Based on Multi-state Perspective-Taking. Int J of Soc Robotics 5, 215–236 (2013). https://doi.org/10.1007/s12369-013-0181-3

Download citation

Accepted: 08 February 2013
Published: 28 February 2013
Issue Date: April 2013
DOI: https://doi.org/10.1007/s12369-013-0181-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Towards a Task-Aware Proactive Sociable Robot Based on Multi-state Perspective-Taking

Abstract

Similar content being viewed by others

This Robot is Sociable: Close-up on the Gestures and Measured Motion of a Human Responding to a Proactive Robot

Stepped Warm-Up–The Progressive Interaction Approach for Human-Robot Interaction in Public

Impression Evaluation for Active Behavior of Robot in Human Robot Interaction

Explore related subjects

1 Introduction

2 Related Work

3 Contributions of the Paper

4 Objective and Nature of the Proactive Behaviors of Interest

5 Methodology

5.1 Multi-state Visuo-Spatial Perspective Taking

5.2 Symbolic Categorization of Efforts

5.3 Grasp and Placement Analysis

5.4 Framework to Instantiate Proactive Solution

5.4.1 Extracting Human-Adapted Candidate Places

5.4.2 Extracting a Feasible Solution for a Particular Effort Level

5.5 Illustration of the Framework for Different Tasks

5.5.1 For Proactively Reaching out

5.5.2 For Proactively Suggesting ‘Where’ to Place

5.6 Remark on Convergence Time

6 Experimental Results and Analysis

6.1 Demonstration of the Proactive Planner

6.1.1 Proactively Reaching out

6.1.2 Proactively Suggesting the Place to Put

6.2 Validation of Hypotheses Through User Studies

6.2.1 Users’ Responses for the “Give” Task

6.2.2 Discussion on a Few Observations for the “Give” Task

6.2.3 Users’ Responses for the “Make Accessible” Task

6.2.4 Discussion on a Few Observations for the “Make Accessible” Task

6.2.5 Overall Inter-task Observations

7 Discussion and Potential Applications

8 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation