1 Introduction

As robots move towards being cooperative and social, the challenges of incorporating the basic ingredients of such behaviors are becoming prominent. In this context, just being reactive or active is not sufficient. Behaving proactively in a human centered environment is one of the desirable characteristics for the social robots [13, 47].

Proactive behavior, i.e. taking the initiative whenever necessary to support the ongoing interaction/task, is a mean to engage with the human, to satisfy internal social aims such as drives, emotions, etc. [14]. Proactive behavior can be tackled at different levels of abstraction and through various perspectives. We are essentially interested here in one particular topic: the synthesis of proactive motions with the aim to facilitate human-robot collaborative task achievement. In this stream, one non-trivial aspect is to determine “where” (an important aspect of joint task [55]) a joint action should preferably take place and what the robot can do to “propose/communicate” the choice it has made.

We have already developed a system, which enables the robot to perform a set of basic human-robot interactive manipulation tasks such as give, show, hide, make-accessible an object to the human [43]. In this paper, we will focus on a complementary aspect of such interactive manipulation in which the human will be required to perform the task or to contribute to a joint task in order to achieve some joint goal. We will consider two such tasks:

  1. (i)

    Give: The human has to give some object to the robot by holding it somewhere, and

  2. (ii)

    Make Accessible: The human has to make some object accessible to the robot by placing it somewhere.

As shown in Fig. 1(a), if the robot asks, “Please give me the toy dog …” and remains in the rest position, this might create confusion for the human about how and where to give: “Should I move and reach out to the robot to put the object in its hand?”, “should I put it somewhere on the table for the robot to take it?”, etc. Now assume that the robot, along with its request to give the object, also shows proactive reach out behavior by moving its hand to take the object, as shown in Fig. 1(b). This might significantly guide the human about how to perform the task and where to give. Moreover, if it will reach out to a place, which is convenient from the human’s perspective to give the object, it will also reduce the human’s effort.

Fig. 1
figure 1

The robot asks to the human “give me the toy dog …” (a) by maintaining its rest position, (b) by proactively moving its hand to an appropriate place to guide the human’s action

Similarly, if the human has to make some object accessible to the robot, the robot could proactively advise the human about where to put the object so that the robot would be able to take it. This will reduce the human’s confusion. Again, if it takes into account the human’s effort to find a human-adapted proactive solution, it will further reduce the effort of the human.

Expressive behavior coupled with perspective-taking has been shown to be important for the socio-cognitive aspect of Human-Robot Interaction [3]. Hence, while finding a human-adapted solution for a task, the robot takes into account the visuo-spatial perspective of the human. Moreover, the robot performs such perspective-taking not only from the current state of the agent, but also from a set of different states achievable by the agent; hence the term multi-state perspective-taking is derived.

Apart from reducing the ‘confusion’ and ‘effort’, with such proactive behaviors the robot might also better communicate its ‘abilities’ and its ‘understanding’ about the abilities of the human partner. In this paper, we will hypothesize such proactive behaviors; present a framework to instantiate a solution for such behaviors; and validate through user studies that proactive behaviors indeed reduce the ‘confusion’ of the human and incorporating the human adapted aspects further reduce the ‘effort’ of the human partner.

2 Related Work

Proactive behavior could be at various levels of abstractions and could be exhibited in various ways ranging from simple interaction [36], to proactive task selection [5, 35, 52, 53]. Our work has been applied to and instantiated by the domain of human-robot cooperative object manipulation, in which interactive pick and place [30] and handover tasks have been identified as an essential capability for a robot assistant. From this, what is pertinent is where to place the object, the robot’s gripper, in which direction to orient the robot’s head, etc., in order to provide maximum information and help. In [55] humans’ ability to estimate such aspects of ‘what’, ‘when’ and ‘where’ parts of others’ actions and their importance in online action coordination have been shown. Hence, the robot should also be equipped with such abilities for better coordination and cooperation even while deciding to behave proactively.

In the context of proactive behavior, estimation and utilization of ‘what’ has been addressed by various researchers. In [52, 53], the robot estimates what the human wants and selects a task using a probability density function. In [23], a cost based anticipatory action selection is done by the robot to improve joint task coordination. ‘When’ has also been addressed in various ways. In [34], temporal Bayesian networks are used for proactive action selection to minimize wait time. In [7], a robot wheelchair takes control when a handicapped human needs it and in [8], an activity constraints violation based scheduler is used to remind the human. Using the information about ‘how’ for behaving proactively has been shown in [16], where switching hidden semi-Markov model is used to learn a house occupant’s daily activities and to alert the caregiver in case of abnormality. There are various other works which shows the robot learning/understanding ‘what’ does a task ‘mean’ and ‘how’ the task is performed [9, 44, 45], at symbolic level of abstraction, which could be used for synthesizing and showing proactive behaviors. On the other hand, from the point of view of utilizing the information about ‘where’ the human can perform the task and showing proactive behavior accordingly, there has not been significant work in robotics. In [58], a robot wheelchair predicts the location the user is trying to go, by using a POMDP model, and drives him/her there. In Human-Human interaction, the notion of proactive eye movement have been identified [18], and further in [54], such proactive gaze have been proposed as a dimension to measure HRI. However, their notion of proactive gaze corresponds to predicting the goal of the action, and then proactively shifting the gaze directly towards the goal. This notion of proactivity is complementary to the proactive behaviors within the scope of the paper, in the sense instead of shifting its gaze proactively based on the human’s action, the robot proactively finds a solution for the human action and suggests it through its proactive actions. However, such proactive actions might include proactive gaze as a component or might induce the human partner’s proactive gaze.

Estimating ‘where’ the human can perform a task is helpful for sharing attention with others [59] and to predict spatial characteristics of others’ actions [29]. These are essential for building the robot’s theory of mind [51] and consequently can help to guide the human’s behavior [33, 60] towards the robot.

Our interest is to device proactive behaviors based on ‘where’ the task could be performed. Let us derive the interest behind such proactive behaviors and at the same time identify some of their key ingredients. In [25], it has been shown that in general the participants appreciate the robot’s initiatives of showing some movements as an engagement attempt. The experiment was in a receptionist-visitor scenario, and the type of initiative was gaze shifting, which is different from our HRI object manipulation scenario and the associated proactive behaviors’ types. However, the finding that the participants rated the robot higher when it showed some movements in its gaze than just being still, points that proactive initiatives of the robot could better engage the human and serve the purpose of interaction opening. Further, in [38], it has been shown that simple arm-head gesture increases the expressive power of the social robots. Although, such movements have not been directly studied in relation to proactivity, we will hypothesize proactive behaviors, which will incorporate arm and head movements, as an attempt to be expressive. Moreover, a robot moving its arm to a location could induce human goal anticipatory response as demonstrated in children [21]. Further, regarding the object manipulation, which is the focus of our paper in HRI context, we find that gazing plays an important role in pointing-based object-reference conversation [27]. Therefore, in our hypothesized proactive behaviors, we incorporate to look at the object and the place of interest.

In this paper, our focus will be the aspect of estimating ‘where’ a task could be performed by the human and then hypothesize different proactive behaviors in terms of verbal as well as non-verbal key communicative components (arm motion, gaze shifting) as identified above.

3 Contributions of the Paper

Below we summarize the contribution of the paper: conceiving hypotheses related to proactive behaviors; developing a framework to synthesize a solution for proactive actions based on ‘where’ the robot could support the task, while ensuring least feasible human effort; and validating the hypotheses through a preliminary user study.

  1. (i)

    Hypothesizing proactive behaviors and expected effect in HRI: For interactive human robot joint tasks, we hypothesize the following:

    (a) Proactive Reach out to Take from the Human: We postulate that along with informing verbally, the robot should proactively reach out to take, in the case the human has to give something to it.

    (b) Proactively Suggesting ‘Where’ to Place: We postulate that the robot should proactively suggest (verbally and by gaze shifting) about ‘where’ to place an object, in the case the human has to make the object accessible to the robot.

    We hypothesize that such proactive behaviors will be preferred over non-proactive behavior and will reduce the ‘confusion’ of the human partner.

  2. (ii)

    Framework to instantiate human-adapted proactive actions for least feasible human effort: We will present a generic framework to find a solution for different proactive behaviors. The framework estimates ‘where’ the human can perform the task by respecting environmental and postural constraints and where the robot can support the task. Our framework will provide a human adaptive proactive solution, by taking into account human perspective and effort (Sect. 5.4).

    So, we will further hypothesize that such human-adapted proactive behavior will be preferred over non-proactive behavior of the robot as it will also reduce the ‘effort’ of the human, and the robot should be perceived as being more ‘aware’ about the human’s capabilities and more ‘communicative’ about its own capabilities.

    We will show through experimental results, the capability of the framework to provide a solution for different scenarios and for different robots with visible reduction in the human effort (Sect. 6.1).

  3. (iii)

    User studies to validate the hypotheses: We will present the results of a set of user studies to validate the hypothesis that such proactive behaviors indeed reduce the human ‘confusion’ and the human adapted aspect further reduces the human ‘effort’ and make the robot more ‘aware’ and ‘communicative’ (Sect. 6.2).

In the next section, we will formally explain the nature of the proactive behavior within the scope of the paper and give a general description about how such proactive behaviors are achieved. Then we will present the methodology to instantiate such proactive behaviors in Sect. 5. Here, we will first briefly describe the multi-state visuo-spatial perspective-taking ability of the robot and assignment of the effort levels. Then, in Sect. 5.4, we will present the generic framework, which iteratively finds a feasible solution for proactive behavior. The approach is to find a place ‘where’ the human might perform a task with least feasible effort and among those places ‘where’ the robot could support it, given its own constraints. In Sect. 5.5, we will illustrate the framework to instantiate solutions for two proactive behaviors hypothesized above: (a) proactively reaching out to take for the give task by the human and (b) proactively suggesting ‘where’ to put for the make-accessible task by the human. In the experimental results and analysis, Sect. 6, we will demonstrate two main aspects:

  1. (i)

    Sect. 6.1 will demonstrate the capability of the framework to find solution for different scenarios, and for different types of robots. We will also demonstrate the reduction in the human effort in the case of human adapted proactive behavior.

  2. (ii)

    Sect. 6.2 will show supporting evidence for our hypotheses with user studies.

Finally, we will conclude the paper with discussions on potential applications and pointer to the future work.

4 Objective and Nature of the Proactive Behaviors of Interest

We define an environment En, which consists of entities: objects and agents, and a set of attributes At related to physical states of all the objects, and physical and mental states as well as abilities of all the agents. The state of the environment consists of the set of tuples 〈entity,attribute,value〉. From the planning point of view, the target state of the environment for a task T is specified as a set of facts \(F^{T} = \{f_{i}^{T} \}\), where each fact \(f_{i}^{T}\) will consist of tuple 〈entity,attribute,constraint〉. Facts could be directly observable as well as inferred. For a particular task T, the constraint for a particular instance of 〈entity,attribute〉 pair could be in different forms:

  1. (i)

    A single value. For example, if an object O should be at (x,y,z), for the entity O and its attribute position, this constraint results into \(f_{i}^{T}=\langle O, \mathit{position}, (x,y,z) \rangle\).

  2. (ii)

    A set of values because of some desired facts. For example, if an object O should be on the table, this results into \(f_{i}^{T}=\langle O, \mathit{position}, \{ p_{j}\} \rangle\), where {p j } is the set of positions at the table top.

  3. (iii)

    A set of values because of some forbidden facts. For ex- ample, if the constraint is that an object O should not be reachable by an agent Ag, this results into the fact \(f_{i}^{T}=\langle O, \mathit{position}, \{p_{k}:\mathit{non}\_\mathit{reachable}(O, Ag, p_{k})\} \rangle\).

Hence, for a particular task T, all the constraints for a particular instance 〈ent,at〉 of the 〈entity,attribute〉 pair will eventually result into a space \(s_{ent,at}^{T}\), in which its value can lie. Given F T, we say the pair 〈ent,at〉 to be properly grounded if \(s_{ent,at}^{T}\) contains only one value, i.e. if \(s_{ent,at}^{T}\) is a point in the potential space for that 〈ent,at〉 pair. Let SP T denotes the space corresponding to all the properly groundedentity,attribute〉 pairs. Given F T, we say a 〈ent,at〉 pair to be loosely grounded for the task T, if \(s_{ent,at}^{T}\) contains multiple potential values. Let SL T denotes the space corresponding to all such loosely grounded 〈entity,attribute〉 pairs. In one sense, SL T provides the agent with latitudes to perform the task differently, but on the other hand it can also be the source of confusion because of the burden of deciding among multiple possibilities. And this is where, the type of the proactive actions, which are within the scope of this paper, play its role.

We assume that a proactive action is synthesized based on an initial specification of expected facts F T about the task T, the sequence of already planned actions A T to perform the task, the environment state En I before executing A T, the predicted state of the environment En F , if A T would be executed. For synthesizing proactive actions, the intention/motive behind such actions is important.

Proactive behaviors could be of various types and could be exhibited in various ways. The focus of this paper is on the type of proactive behavior, which makes changes in the loosely grounded space SL T, as discussed above, by instantiating and/or better specifying its parameters and communicating it through the proactive action. The intentions behind such proactive behaviors are: to guide and facilitate the human partner to better perform the task, to achieve the joint goal by reducing the task related confusions and to reduce the effort of the human partner. For example, see Fig. 2, where an ellipse shows one state of the environment, and an edge shows an action. In the absence of any proactive behavior, one of the states of the expected final environment will be En F . In the case of proactive behavior, an intermediate environment En Intr is created by the robot’s proactive action \(R_{A}^{P}\) by instantiating or better specifying the expected state of the environment. Hence, the proactive behavior leads to a change in any of these: the physical state of the environment, the physical state of the human, the mental state and belief of the human. And the human adapts to it to achieve the task by action \(H_{A}^{M}\), by modifying the parameters of his/her planned action. This might result into a partial different final state of the environment, \(\mathit{En}_{F}^{\prime}\). In this paper, the hypothesized proactive behaviors do not eliminate the set of already planned actions, A T, as well as preserve the already properly specified SP T space of the final environment. However, the parameters of A T could be adapted. Hence, there is a common part in the final states of the environment, represented as \(\mathit{En}_{F}^{sp}\) in Fig. 2.

Fig. 2
figure 2

A type of proactive behavior, which results into an intermediate physical state of the world and/or changed mental state of the human, mainly to reduce the confusion and effort of the human for smooth execution of the joint task

For the give task by the human, the set of specifications about the facts F Give includes: the object should be in the hand of the robot, and the set of actions A Give includes that the human will pick and carry the object and the robot will grasp the object. For simplicity the parameters of the task is omitted, such as the performing agent, the agent for whom the task will be performed, the object, etc. For the make accessible task by the human, F Make_Accessible includes the fact that the object will be on a support, and it will be graspable, reachable and visible to the robot. Whereas, A Make_Accessible includes that the human will pick, carry and place on a support. Based on these sets of F T and A T, the two complementary research challenges are, autonomous synthesis of appropriate proactive action \(R_{A}^{P}\) and autonomous instantiation of the corresponding intermediate state of the environment, En Intr . In this paper, we assume that \(R_{A}^{P}\) for a task is already known as we have hypothesized them in Sect. 3. And in the next section we will present the framework to instantiate the En Intr to communicate through hypothesized \(R_{A}^{P}\). In our current examples, this instantiation of the unspecified part involves reasoning about the ‘where’ aspect of the task i.e. the placement (either to hold or put) of the object in the final environment.

5 Methodology

5.1 Multi-state Visuo-Spatial Perspective Taking

Reasoning by modeling the agent and its behavior in simulation has been shown to be effective for perspective taking, teamwork and social behavior [31]. Visuo-spatial perspective-taking of the human has already been shown as an important component in Human-Robot Interaction [2, 17]. We have enriched the robot’s ability of perspective taking not only from the current state of the agent but also from a set of different states the agent might attain, and presented the concept of Mightability in [43]. Mightability stands for “might be able to …” and facilitates multi-state visuo-spatial perspective taking. The idea is to analyze various abilities, A b , of an agent by applying an ordered list of virtual actions, A v =[a 1,a 2,…,a n ], while respecting the environmental and postural constraints. The robot estimates the abilities at 3D grid level and at object level, which we termed as Mightability Maps (MM) and Object Oriented Mightabilities (OOM) respectively. In the current implementation:

(1)
(2)

5.2 Symbolic Categorization of Efforts

We have categorized the effort to attain a state from the current state based on the joints involved. This is motivated from the studies of human movement and behavioral psychology [10, 19], where different types of reach actions of the human have been identified and analyzed. This includes reaches involving simple arm extension (arm-only reach), shoulder extension (arm-and-shoulder reach), leaning forward (arm-and-torso reach) and standing, as shown in Fig. 3. Such categorization could be further enhanced based on the studies of musculoskeletal kinematics and dynamics models [32, 48]. The associated effort level based on the joints involved in applying a particular A v , are:

(3)
(4)

Table 1 shows this categorization with the relative levels of effort. For example, assume an agent is currently sitting. To see a particular object if the agent will be required to apply A v =[Make_Standing,Lean_Torso], the corresponding effort level to see the object will be Whole_Body_Effort, as it involves additional joints, instead of just turning the head or the torso. Whereas, Displacement_Effort corresponds to the situations in which the agent has to move from the current position.

Fig. 3
figure 3

Taxonomy of reach actions: (a) arm-shoulder reach, (b) arm-torso reach, (c) standing reach. (Adapted from [10, 19])

Table 1 Effort classes for visuo-spatial abilities

5.3 Grasp and Placement Analysis

For behaving proactively, it is important for the robot to compute where the task could be performed. To find the feasibility of a task, the framework not only reasons about the places to perform the task, but also performs rich geometric analyses on the possibilities of grasp and placement orientations of the object. This is important, because there might exist a reachable and visible place P to perform a task, but placing at P is either not feasible by the human or might not allow to grasp the object by the robot. This could be due to various factors, such as the cluttered environment, difference in the shapes of the anthropomorphic hand and the robot’s gripper, etc. Hence, reasoning about existence of collision free grasps while picking or finding a position to place an object is important. This becomes more prominent with the real life objects of complex shapes. Our system is capable of analyzing feasibility of grasps of complex objects for robot’s gripper and for anthropomorphic hand, given an environment [50]. This further enables the robot to extract a set of simultaneous grasp pairs, by the human and the robot to facilitate the tasks requiring object hand-over. Further, an object could be placed at a position P in different orientations, and depending upon the environment some or none of these placements might be stable and/or allow to grasp. Therefore, it is also important to test that from the task perspective, at a particular position there exists any feasible placement orientation or not.

Therefore, to incorporate these feasibility tests, the robot computes and stores the sets of possible grasp and placement orientations assuming a collision free environment. Further, the robot autonomously computes the support planes based on the vertical facets, which could be part of any object such as a table, a box, etc. This enables the robot to find a feasible placement of an object on any other object. During planning for a task, the planner filters out the non-feasible placements and grasps by simulating them in the 3D environment (Move3D [56]) reflecting the current state of the world.

5.4 Framework to Instantiate Proactive Solution

Figure 4 shows the reasoning process for extracting a solution to behave proactively. As shown in block ‘a’ of Fig. 4, input consists of the current environment state En I , the task T with its parameters: performing agent, target agent (for whom the task would be performed), the name of the object. In addition, the current maximum allowed effort level of the robot, R MEL and of the human, H MEL are provided.

Fig. 4
figure 4

Proactive planner: Reasoning to find the human-adapted solution for proactive behavior, ensuring the least feasible effort of the human partner. Block ‘i’ is detailed in Fig. 5

5.4.1 Extracting Human-Adapted Candidate Places

The planner reasons about ‘where’ the human can perform the task. As an attempt to ensure minimum feasible effort by the human, initially the maximum allowed effort level of the human, H EL is set as level 0, i.e. No_Effort_Required (see Table 1) to see and reach the places to perform the task (block ‘b’ of Fig. 4). In block ‘c’ of Fig. 4, the planner will find the candidate places where the human can perform the task T with his current maximum effort level. Depending upon the task T, for a particular agent Ag, the planner is already provided with a set of constraints \(C_{T}^{Ag}\) to find the set of candidate places:

$$ C_{T}^{Ag} = \bigl\{c_{i}^{Ag}: i=1, \ldots, m\bigr\} $$
(5)

where m is total number of constraints and \(c_{i}^{Ag}\) consists of tuple:

$$ c_i^{Ag}= \bigl \langle \mathit{effort}\_\mathit{level}(E), \mathrm{ability}(A_b)= \mathit{true}|\mathit{false} \bigr\rangle $$
(6)

where effort_level is the element of expressions (3) or (4) depending upon the ability, which is element of expression (1). The desired values as true or false are known for the task a priori. For example, if the task is to give an object, the planner knows that the abilities to see and reach the candidate places by the performing agent should be true for the desired effort level. If the human, Ag=H, has to give or make something accessible to the robot, for the human’s current effort level, which is set as H EL =0, Eq. (5) results into the set of constraints: \(C_{T}^{H} = \{c_{1}^{H},c_{2}^{H}\}\), where, \(c_{1}^{H}= \langle0, \mathit{see}=\mathit{true} \rangle\) and \(c_{2}^{H}= \langle0, \mathit{reach}=\mathit{true} \rangle\).

Hence, the planner extracts the set of candidate places, \(P_{H_{EL}}^{T}\), which satisfies, for the agent human, Ag=H, the set of constraints, \(C_{T}^{H}\):

(7)

For extracting \(P_{H_{EL}}^{T}\), based on the constrains, the planner performs relevant set operations on the corresponding Mightability Maps (see Sect. 5.1) of the agent. However, there might be the situations where there exists no common reachable cell in the Mightability Maps of two agents, but because of the object size there might exist some places to perform the task, such as the task of handing-over. As the object is known to the planner, to ensure not to lose such places, before finding \(P_{H_{EL}}^{T}\), the Mightability Maps are expanded based on the dimension of the object’s bounding box.

Hence, block ‘d’ of Fig. 4 shows \(P_{H_{EL}}^{T}\), the set of places where the human might be able to perform the task T for the given effort level, H EL . For effort level 0, in which the human is not expected to even move the arm, \(P_{H_{EL}}^{T}\) will be NOT NULL only if the object is already in the human’s hand. The human has to just maintain his/her posture and the robot will be expected to take the object from his/her hand. In this case, \(P_{H_{EL}}^{T}\) will be a set of points corresponding to the object’s bounding box.

In the case of NOT NULL candidate positions (as tested in block ‘e’ of Fig. 4), in block ‘f’ the planner extracts the set of candidate points where the robot can support the task. For this, from the robot perspective, Ag=R, another set of similar constraints, \(C_{T}^{R}\), is applied on the candidate places obtained for the human, \(P_{H_{EL}}^{T}\). This results into a refined set of candidate places, \(P_{H_{EL}}^{T,R_{EL}}\) (block ‘g’ of Fig. 4):

(8)

where n f is the total number of final candidate places. If \(P_{H_{EL}}^{T,R_{EL}}\) is NOT NULL, as tested in block ‘h’ of Fig. 4, the planner calls a subroutine (block ‘i’ of Fig. 4) to perform various tests to find a feasible solution in the candidate space. Figure 5 gives an overview of the processing of block ‘i’, as will be explained in the next sub-section. If there exist a feasible solution, then the test in block ‘j’ will take the control to block ‘m’, which will return the feasible solution.

Fig. 5
figure 5

Extracting a feasible solution (block ‘i’ of Fig. 4): Iterating on 3 search spaces: candidate grasps, candidate placement orientations and candidate placement positions to satisfy various constraints of task, manipulation planning and human’s perspective to find a feasible solution

Note that at any stage of planning if the planner fails to find a candidate place or a feasible solution, and if there is still a possibility of increasing the effort level (block ‘k’), it increases the acceptable effort level of the human in block ‘l’ and begins the next iteration.

5.4.2 Extracting a Feasible Solution for a Particular Effort Level

Figure 5 shows an overview of the iterative process for extracting a feasible solution (block ‘i’ of Fig. 4) by incorporating the constraints from the perspectives of grasp, placement, visibility of the object and planning. The planner takes as input the candidate set of places obtained in block ‘g’ of Fig. 4 and ranks them. Currently we assign weights based on the closeness to the target-object position. This is based on the assumption that the human needs to put less effort in placing or holding the object if he/she has to carry the object for a shorter distance. Another motivation behind such weight assignment is to exhibit goal-directed behavior. In [22], it has been suggested that to be intentional the robot should exhibit goal-directed actions. Hence, this weight assignment also drives the solution to be directed towards the object, which might also inherit the notion of pointing to the desired object.

Assignment of such weights for candidate places for proactive behavior and the studies and frameworks for preferable hand-over positions such as [6, 15, 26, 57] could mutually benefit. For the latter case the possibility of proactive behavior by the human, who will be the receiver, could be incorporated. Whereas, for the former case in which the robot will be the receiver could take into account further aspects related to preferable hand-over positions.

The planner further takes the initial sets of grasp configurations and placement orientations of the object, and filters them based on the task and the environment to extract reduced candidate sets. The candidate grasps are ranked based on the stability [49] and the placement visibility orientations are ranked for a particular position based on the visibility % of the object from the target agent’s perspective [41]. Then the planner iterates on the three candidate lists to perform a series of feasibility test as shown in Fig. 5. The planner selects a tuple consisting of a placement position, a grasp and a placement orientation (based on their ranking, from highest to lowest) and uses dedicated modules for different feasibility analyses: [41] for % visibility testing of an object from an agent’s perspective [20], for planning collision free path. If any of the feasibility tests fails, another candidate tuple is selected. If all the feasibility tests for a particular task are passed then that particular placement position is considered to show the proactive behavior. If the task demands, a smooth trajectory is also generated for the execution, using [4].

5.5 Illustration of the Framework for Different Tasks

In this section, we will illustrate the presented proactive planner to find a human-adapted feasible solution to behave proactively. We will consider two different tasks performed by the human partner: to give an object to the robot and to make accessible an object to the robot.

5.5.1 For Proactively Reaching out

Figure 6 shows the initial scenario in which the human has to give the toy dog, placed on his right, to the robot. The robot will plan to proactively reach out to take. For the current example, as it is a table-top cooperative manipulation scenario, to avoid more expensive motions of the robot, its maximum allowed effort is set as Arm_Effort. This restricts the robot from planning to turn or move its base, it can only move its arm for achieving the current sub-task of getting the object from the human. And the human maximum effort level is provided as Whole_Body_Effort. However, these could be modified online by higher-level decision making or supervisor systems, such as ours [1, 12]. As already explained in Sect. 5.4.1, for the human, initially the least effort level, No_Effort_Required, will be tested for extracting the candidate places to give the object. The planner will get the candidate set of places in block ‘d’ of Fig. 4 as NULL, as the object is not already in the human’s hand. Hence, the control reaches to the block ‘k’ and increases the human’s reach effort level to Arm_Effort in block ‘l’ (we chose to maintain the corresponding effort level to see as No_Effort). This means that the planner estimates the places, which are in the current field of view of the human and where the human could give the object to someone, if he will only stretch out his arm. Figure 7(a) shows the candidate places for giving the object by the human for this level of effort, obtained in block ‘d’ of Fig. 4 in the next iteration. Green, red and yellow points show giving possibilities by right, left and both hands respectively. In block ‘g’ of Fig. 4 with the maximum allowed effort level of the robot, the planner extracts the subset of the places where it can support the human. For our current example, this turned out to be NULL, as there was no point reachable by the robot with its current effort level among the points of Fig. 7(a). Hence, the planner again reaches to the block ‘l’ of Fig. 4 to test for the next effort level of the human, by setting Arm_Torso_Effort to reach and Head_Torso_Effort to see. Figure 7(b) shows the candidate points for giving the object by the human, who is now expected to lean forward and/or turn around while being seated. In this iteration, in block ‘g’ of Fig. 4 the planner finds a set of candidate places, from where the robot could take an object from the human. Figure 7(c) shows these candidate places as green point cloud. The resultant candidate places after the weight assignment as explained in Sect. 5.4.2, have been shown in Fig. 7(d). Blue points have the highest weight in the sense they will be preferred over the red points having lowest weights. The feasible solution corresponds to the first highest weight candidate point, which passes the rest of the grasp, placement, object visibility and trajectory oriented feasibility tests. This feasible solution obtained in block ‘m’ of Fig. 4, has been indicated in Fig. 7(d). At the end, depending upon the task, the planner returns appropriate data for exhibiting proactive behavior. For the current task, it returns the winner feasible place obtained in Fig. 7(d), the corresponding levels of efforts for the human and for the robot, the trajectory to reach the place, and the estimated end configuration of the robot, as shown in Fig. 8.

Fig. 6
figure 6

Initial scenario in which the human has to give the toy dog to the robot. The robot has to plan to proactively reach out to take, instead of standing still and waiting for the human to act

Fig. 7
figure 7

Candidate places for giving an object by the human (green: by right hand, red: by left hand, yellow: by both hands) (a) from his current position with Arm_Effort and (b) if the human will put effort to move his torso (lean forward or turn) while being seated, Arm_Torso_Effort. (c) Candidate points from where the robot can take the object for the effort (b) of the human. (d) Weighted candidate points based on the nearness to the target-object: the toy dog. The pointer indicates ‘a’ highest weighted feasible place obtained by feasibility analyses of Fig. 5 (Color figure online)

Fig. 8
figure 8

Computed configuration for the proactive reach out to take, in the case the human has to give the toy to the robot

5.5.2 For Proactively Suggesting ‘Where’ to Place

The robot finds ‘where’ the human can put the object for the robot to take and proactively suggests the human about that place. As mentioned in Sect. 5.3, the robot is able to find the horizontal surfaces based candidate points to place something on those supports. Hence, in block ‘d’ of Fig. 4, the planner finds the places at the top of the box as well, where the human can put the object as shown in Fig. 9(a). Figure 9(b) shows the weighted candidate points to perform the task. Figure 9(c) shows the feasible estimated placement of the object obtained in block ‘m’ from where the robot could take it. Apart from the similar information for proactive reach out task, the proactive planner also provides the symbolic information that the placement is ‘on the box’ based on the reasoning on the inter-object spatial relations. Incorporating other predicates such as left, right, next to, etc. could further enrich the location description while suggesting the place to put.

Fig. 9
figure 9

Task of making an object accessible by the human to the robot. (a) Places on the support planes where the human can put something with least effort. (b) Weighted points where the robot can support the human by taking the object. (c) The planner found a possible placement of the object on the box from where it is feasible for the robot to take. Note that, because of the object-closeness based weight assignment, this placement also reduces the human’s effort to carry the object

5.6 Remark on Convergence Time

As the first step, the main focus of this paper is to incorporate the key elements of grasp, visibility, placement, feasibility of trajectory, etc. from the perspective of both the agents: the robot and the human. One part of our future work is to further optimize the iterative approach presented in Fig. 5. Therefore, we will provide an approximate idea about the convergence time.

The candidate search space based on Mightability Maps could be updated online [43] and the initial lists of grasp and placement are calculated only once for each new object and stored. Hence, the convergence time for the algorithm mainly depends upon the number of times it has to backtrack due to failure of any of the tests in Fig. 5 and the time taken by the path planner, which is presently a RRT based planner [20]. The algorithm finds a feasible solution for the typical scenario, shown in Fig. 6, in 1.6 seconds. The convergence times for other scenarios and tasks presented throughout the paper varies between 1 to 4 seconds.

6 Experimental Results and Analysis

We have tested our system on two different robots: JIDO, a home-built mobile manipulator equipped with a LWR Kuka arm and PR2 from Willow Garage. The robots use Move3D [56], an integrated planning and visualization platform. The robots, through various sensors, maintain and update the 3D world state in real time. For object identification and localization, a tags based stereovision system is used. For localizing the human and tracking the whole body, data from Kinect (Microsoft) sensor is used. The human’s gaze is simplified to his/her head orientation, estimated through markers tracked by a motion capture system in real time. We are using Kinect to track the whole body of the human but from this we do not get precise head orientation of the human. Therefore, we use a motion capture system to track the human head. We avoid tracking the whole body through the motion capture system. This is because it is marker based system and needs a precise model and frequent calibration for each individual. We have created a fixed model of goggles for the motion capture system, which can be used by any human, to get his/her head orientation.

In all the experiments the speech of the robot was scripted and only some of the parameters were synthesized, such as the name of the object and the support (an object or a piece of furniture) provided by the proactive planner. The entire experiment runs on the real robots. However, throughout the paper at appropriate places we have put the pictures of the corresponding 3D environment to show some intermediate steps of the planning.

In [24], it has been shown that simple changes in the robot’s gaze could show robot’s attention and intentions to the human partner. Further, in [28], it has been found that the robot’s eye contact and hand movement with situated dialog help in achieving joint attention with the human partner. Therefore, we have also incorporated such attentional behaviors in a scripted manner, while exposing the robot to interact with the users.

The experiments are controlled in the sense, when the user sits comfortably only then the remote operator starts the script. The experiment begins by the robot saying, “I need your help …”. If the human looks at the robot (detected by the robot through visual perspective taking of the human), it assumes that the joint attention has been established; otherwise it continues to repeat the sentence. Once the joint attention has been established, the robot looks at the desired object then at the human while exhibiting or not exhibiting the proactive behavior. The task and its parameters are already provided to the script.

6.1 Demonstration of the Proactive Planner

This sub-section will be confined to demonstrate two aspects:

  1. (i)

    The planner is generic and independent of the scenario and the robot.

  2. (ii)

    The resultant solution visibly reduces the human efforts in different situations, when analyzed through the perspective of the effort levels presented in Table 1. Then in the next sub-section, Sect. 6.2, we will show and analyze the results of the preliminary user studies. That will provide supportive and encouraging evidence of the proactive behaviors hypothesized in this paper.

6.1.1 Proactively Reaching out

Figure 10(a) shows an initial scenario in which the robot requests the human to give the object indicated by the red arrow. Figure 10(b) shows the final scenario, where the human is giving an object to the robot for the case when the robot did not move its hand proactively. The human is standing and trying to give the object, hence putting Whole_Body_Effort (see Table 1). Whereas, in the case when the robot was allowed to behave proactively, the proactive planner successfully finds a feasible place to take the object from the human, while ensuring minimum feasible effort by the human. Figure 10(c) shows the case in which the robot is proactively reaching to the feasible place to take the object. This proactive behavior has reduced the human’s effort for the task, as the human is just leaning forward from the seated position to give the object. Hence, the effort is Arm_Torso_Effort instead of the Whole_Body_Effort of Fig. 10(b). Figure 11 shows another scenario where the human and the robot are sitting in a different spatial arrangement. Figure 11(a) shows initial scenario and the position of the object to be given by the human. Figure 11(b) shows the situation of non-proactive behavior, the human is standing and giving the object to the robot. Nevertheless, as shown in Fig. 11(c), the proactive planner finds a suitable human adapted reach out place, which reduced the human effort to Arm_Effort.

Fig. 10
figure 10

(a) Initial scenario for giving the object grey tape marked by red arrow. (b) The user is trying to give by standing up, Whole_Body_Effort, in the absence of proactive reach behavior by the robot. (c) The user is giving just by leaning forward, Arm_Torso_Effort in the case of proactive reach by the robot (Color figure online)

Fig. 11
figure 11

(a) Another scenario for the task of giving an object to the robot. (b) In the absence of any proactive behavior, the user is standing up and reaching to the robot (Whole_Body_Effort) to give the object. (c) With proactive reach behavior of the robot, the user is giving the object by only Arm_Effort

We have further tested our system on another robot PR2, to illustrate the portability of the system and the ability of the proactive planner to take into account different robots of different kinematic structures.

Figure 12(a) shows the user giving the object without the robot’s proactive reach behavior, whereas in Fig. 12(b) the user is giving the object with less effort when the robot has proactively moved its arm. Hence, the human effort has been reduced from Arm_Torso_Effort to Arm_Effort. Figures 12(c) and (d) show two different scenarios and the planner is able to find a feasible reach out solution for the PR2 robot. Both the users are giving the object with Arm_Effort in the case of proactive reach out by the robot.

Fig. 12
figure 12

Experiments with the PR2 robot for the give task by the users. (a) The user is putting more effort (Arm_Torso_Effort) in the absence of any proactive reach behavior by the robot. (b) The user is giving with less effort (Arm_Effort) when the robot is reaching out proactively. (c) and (d) The planner is successfully able to find a solution for proactive reach out in different scenarios and the user is putting only the Arm_Effort to give the object

6.1.2 Proactively Suggesting the Place to Put

In this section, we will show the results for the make-accessible task by the human.

Figure 13 shows the initial scenario and its real time 3D representation. In the absence of any proactive suggestion, the human is putting the object close to the robot on the table to make it accessible, Fig. 14(a). This required Arm_Torso_Effort. Whereas, in the case when the robot proactively suggested the human-adapted feasible placement, the human is putting the object on the box. Hence, the human effort has been reduced to Arm_Effort. In Fig. 15, the human is sitting relatively away from the table compared to the scenario of Fig. 14. In this scenario, in the absence of proactive suggestion from the robot, the human is standing up and leaning forward to make the object accessible to the robot, i.e. with Whole_Body_Effort, Fig. 15(a). However, as shown in Fig. 15(b) with proactive suggestion to place, the human has to just lean forward, which is Arm_Torso_Effort. Note that in this case, the robot with its current allowed maximum effort level, which is set as Arm_Effort, was not able to support the human for his Arm_Effort level, as was the case for scenario of Fig. 14.

Fig. 13
figure 13

(a) Initial scenario for the task of making an object accessible by the human to the robot. (b) Its real time 3D representation. The object, to be requested by the robot, is encircled in red in (a) and (b) (Color figure online)

Fig. 14
figure 14

The human is making an object accessible to the robot for the initial scenario of Fig. 13. (a) Without proactive suggestion about where to place, the human is putting it close to the robot with Arm_Torso_Effort. (b) With the human adapted proactive suggestion by the robot, the human is now putting it on the white box as suggested by the robot. This has reduced the human’s effort to Arm_Effort

Fig. 15
figure 15

Make accessible task: (a) Without proactive suggestion about where to place, the human is putting it close to the robot on table by standing and leaning forward with Whole_Body_Effort. (b) With the human adapted proactive suggestion by the robot to put it on the white box, the human is now required to put Arm_Torso_Effort only. Note that the planner could not find a feasible solution for Arm_Effort of the human, as was the case for Fig. 14. This is because the human was sitting relatively away from the table and the robot was not able to support the task for Arm_Effort of the human with its maximum allowed effort level, which was also set as Arm_Effort

Hence, the presented planner is not only able to find a feasible proactive solution for different scenarios for both the tasks, but also it is able to qualitatively reduce the effort of the human partner.

6.2 Validation of Hypotheses Through User Studies

We have performed a series of preliminary user studies to validate the hypotheses and discover the effects of the proactive behaviors of the robot on the users compared to the non-proactive behaviors. In fact, the figures shown in previous sections are from that user study. The two main aspects we want to validate are:

  1. (i)

    Whether the users are experiencing the reduction in confusion about the task because of the hypothesized expressive proactive behaviors or not.

  2. (ii)

    As the presented framework takes into account the human partner’s visuo-spatial perspective and effort to find a solution not only to behave proactively but also to reduce the human effort. Therefore, we further want to validate whether the users are experiencing the reduction in effort or not. Also we want to know that in the case of such human-adapted proactive behaviors, whether the users find the robot to be ‘aware’ and ‘supportive’ to their capabilities or not.

There were a total of 30 users divided into three groups of 10 users, two groups for the give task and one group for the make accessible task. Each user group was a mix of different users based on their exposure to the real robots: no exposure, little exposure, and rich exposure. This was to compensate any bias from the experienced and non-experienced users of robots in general. At the beginning of the experiment, each user was informed that the robot will interact but not about the behaviors and the task. Further, no particular instruction was given to the users about ‘how’ they should behave.

6.2.1 Users’ Responses for the “Give” Task

We setup different scenarios having different relative positions of the robot, the human and the objects. Broadly, the scenarios could be divided into two categories:

  1. (i)

    The human is sitting away from the robot and there is some furniture between them, similar to Fig. 10(a).

  2. (ii)

    The human is sitting relatively closer to the robot with different relative position and there is no furniture between them, similar to Fig. 11(a). The users were randomly selected to sit in one or the other scenario.

There were two user groups for the give task: group I and group II consisting of 10 users in each group. The main difference between the two groups was that they have been exposed to the robots of different appearances: JIDO and PR2. This was to compensate any bias due to the robot’s appearance or kinematic structure while validating our hypotheses.

Each user has been exposed to two different behavior of the robot: NPB and PB. NPB (Non-Proactive Behavior): The robot just asks to the user “Please give me the ≪object_name≫” and waits in its current state. PB (Proactive Behavior): The robot asks the same but also starts moving its arm along the trajectory obtained through the presented proactive planner. In the PB case, it also starts turning its head to look at the object as an attempt to incorporate goal-object-directed gaze movement (head movement in our case) as discussed in Sect. 4.

During the entire experiment, the decision whether PB or NPB should be exhibited first to a particular user was random. After being demonstrated to both behaviors, each user was requested to fill a questionnaire with first behavior referred as B1 and the second behavior as B2. Note that for some of the users B1 was NPB and for some it was PB.

Below we will first analyze the common part of the questionnaire of group I and group II, to show that independent of the appearance of the robots, the proactive reach behavior is preferable over the non-proactive behavior. Then we will present the analyses of the part of the questionnaire, which is exclusive to group I and explore the nature of the confusion and the effect on the effort. (We excluded these questions for group II users for compactness, as they were required to answer about two additional behaviors. Those behaviors are related to the ‘when’ aspect of the proactivity and are beyond the scope of this paper.)

Table 2 shows that in the case of proactive reach out behavior of the robot, the total number of the users having at least one type of confusion has been significantly reduced. This supports the hypothesis that the proactive reaching out to take something reduces the confusion of the user.

Table 2 Type of users’ confusions for the give task

Note that the sum total (%) of the data of these tables and of the tables following may not be 100 as the users were allowed to mark multiple options or none.

Table 3 shows the users’ confusions, reported by group I users, about how to perform the task. It shows the data for two different cases: (i) NPB-PB: When the non-proactive behavior (NPB) has been shown first followed by the proactive behavior (PB). (ii) PB-NPB: When PB has been exhibited first followed by the NPB. The percentage (%) is calculated based on the total number of the users belonging to a particular case (i) or (ii). Note that for the case (ii) in which PB has been demonstrated first, users have been found to be biased towards expecting similar behavior for the next demonstration, which was going to be NPB. Last column of Table 3 reflects this as more users are expecting the robot to show some activity when PB has been exhibited first. In such cases user responses were, “I thought that the experiment has failed, since the robot didn’t move”, “I was waiting for the robot to take it from me.”

Table 3 Users’ responses about the confusion on ‘how’ to perform the give task in the NPB of the robot

Table 4 shows group I users’ responses about the change in their effort. It shows that 71 % users of the NPB-PB case explicitly mentioned that the second behavior, i.e. the PB has reduced their effort to give the object compared to the first behavior, i.e. the NPB. Further, 66 % users of the PB-NPB case explicitly mentioned that the second behavior, i.e. the NPB has demanded more effort to give the object compared to the first behavior, i.e. the PB. On combining both, a majority of the users, 70 % of the total users of group I, reported that the proactive reach out behavior of the robot reduces their efforts compared to non-proactive behavior. Hence, it supports our hypothesis that the human adapted reach out will also make the users to feel a reduction in their efforts in the joint tasks. It also validates that the presented framework is indeed able to find a solution while maintaining least feasible effort of the human partner.

Table 4 Users’ experience on change in effort for the give task

Table 5 (combines group I and group II responses) shows that a majority of the users reported the robots to be more ‘aware’ and ‘supportive’ to them and to the task in the cases it behaved proactively. Table 5 also shows that 80 % of users of group I explicitly mentioned that proactive reach behavior guides them about where to perform the task. Hence, validating the perspective taking capability of the robot.

Table 5 Users’ experience about awareness, supportiveness and the guiding nature of PB for the give task

6.2.2 Discussion on a Few Observations for the “Give” Task

Apart from the direct responses from the users, we observed following situations, which point towards the need of further exploration:

  1. (i)

    Without any proactive reaching behavior, the user in Fig. 16(a) is holding the object and waiting for the robot to take. Whereas, as shown in Fig. 16(b), in the presence of proactive reaching behavior of the robot, the human is also putting some effort to lean and give the object to the robot. This seems to be validating the studies of human-behavioral psychology that goal anticipation during action observation is influenced by synonymous action capabilities [21].

    Fig. 16
    figure 16

    Task of giving an object to the robot. (a) In the absence of any proactive behavior the user is holding the object and waiting for the robot to take. (b) With proactive reach behavior from the robot, the user is also putting some effort to give the object to the robot

  2. (ii)

    For the cases where non-proactive behaviors have been shown first, few users have been found to spend some time ‘searching’ for the object to give, if the table top environment was somewhat cluttered. This might be suggesting to incorporate the component of pointing, by adapting a goal-directed approach to fetch the user’s attention to the object of interest. In our experiments, this has been partially achieved by assigning higher weights to the places closer to the object. This seems to be supporting the findings in [40] and [11] that the directing-to gesture helps in drawing user’s focus of attention towards the object.

Further user studies are required to properly validate and establish these observations as facts.

6.2.3 Users’ Responses for the “Make Accessible” Task

The robot requests the human partner to make an object accessible. We have deliberately built the scenario in which the least feasible effort for making an object accessible to the robot is to put it on the top of a white box.

There were 10 users forming the group III. For this task, instead of exposing the two behaviors randomly to a user, we decided to first show the non-proactive behavior (NPB) followed by the proactive behavior (PB). This is because if the user will be first exposed to the PB, he/she might be biased towards putting the object at the same place in the case of NPB also, as the scenario would be the same.

For the non-proactive behavior (NPB), the robot looks at the human and utters the scripted sentence:

Hey, I need your help. Can you please make theobject_nameaccessible to me.”

For the proactive behavior, (PB), the robot says:

Hey can you make theobject_nameaccessible to me, you can put it on thesupport_name≫”.

As an attempt to incorporate the goal-directed gaze movement (head movement in this case) of the robot, it looks at the object while uttering the first part and then it starts turning its head towards the place where it would suggest the human to put the object.

As shown in Table 6, about 80 % of users have reported confusion about how and where to make the object accessible in the case of NPB. This has been significantly reduced to 30 % in the case of PB.

Table 6 Nature of the users’ confusions for the make-accessible task

Table 7 shows the percentage of users who were suspicious about the robot’s ability about from ‘where’ it could take or see the object. Note that in the case of proactive behavior, as the robot was explicitly suggesting, “… you could put it on the white box”, hence restricting the search space for the user to perform the task, such suspicions have reduced significantly.

Table 7 Users’ suspicions about the robot’s capabilities for the make accessible task

These findings seem to be also supporting the result of [40], which shows that the use of location description increases accuracy in finding the target. In the current experiment, the location description was not for localizing the object, but instead for the place to put the object; hence guiding the user for efficient task realization.

As shown in Table 8, a majority of the users found the proactive suggestion by the robot more compelling. Table 9 shows that 60 % of the users found that the human adapted proactive behavior reduced their efforts.

Table 8 Users’ responses about the robot’s awareness through the PB for the make accessible task
Table 9 Users’ responses about their relative efforts in the make accessible task

6.2.4 Discussion on a Few Observations for the “Make Accessible” Task

  1. (i)

    One interesting observation was related to the human’s interpretation about how to perform the task of making an object accessible. As shown in Fig. 17(a), in the case of non-proactive behavior, the user took the white box away for making the object (pointed by the red arrow) accessible to the robot. Although he overestimated the reach of the robot, his justification was, “… I thought if I would move the obstructing box away, the robot would be able to take the object …” Fig. 18 shows another scenario in which the user is holding the object close to the robot and waiting for it to take. Such observations indicate the need of a situation based proactively suggestion also on the ‘how’ aspects of the task.

    Fig. 17
    figure 17

    Task of making an object (pointed by the red arrow) accessible to the robot. In the absence of proactive behavior, the user took away the white box as an attempt to clear the obstruction for the robot to take the object (Color figure online)

    Fig. 18
    figure 18

    Task of making an object accessible to the robot. In the absence of proactive behavior, the user is holding the object and waiting for the robot to take

  2. (ii)

    Figure 19 shows the user is confused about ‘which’ object the robot has requested. Such confusion has been reported by at least 3 users because of various factors such as background noise, difficulty to ground the object by name, being a novice to the computer-synthesized sound, etc. Moreover, such confusion has been reported in both the cases: non-proactive and proactive. In this particular case the user is trying to reach towards the objects on his left side based on predicting the robot’s attention, Fig. 19(b), but looking at the robot to get some additional information, Fig. 19(c).

    Fig. 19
    figure 19

    Task of making an object (marked as red arrow) accessible to the robot. In the absence of further feedback from the robot, the human is confused about which object to make accessible, as he failed to ground the object referred by the robot (Color figure online)

This suggests that whenever required the element of pointing should be also included in the robot’s behaviors. Another component suggested by Fig. 19(c) is to have a feedback mechanism from the robot. Not only does the robot require feedback from the human but also the robot should provide feedback to the human in a natural human-robot interaction scenario. Works on such complementary issues of grounding references through interaction, such as ours [37, 46], could be adapted for this purpose of proactive behavior with feedback.

6.2.5 Overall Inter-task Observations

In this section, we will combine the results of both the tasks to draw some global conclusions. Table 10 (by combining Table 2 and Table 6) shows an overall 66 % reduction in confusion in the case of proactive behavior. Table 11 shows that a majority of the users, 65 %, experienced that the human adapted proactive behavior reduced their efforts. Table 12 shows that a majority of the users, 85 %, reported that the proactive behavior has better communicated the robot’s capabilities and was more supportive to the task and to them.

Table 10 Overall reduction in the users’ confusion because of the robot’s proactive behavior
Table 11 Overall reduction in users’ effort because of the robot’s proactive behavior
Table 12 Overall responses about supportiveness and communicativeness of the proactive behavior

7 Discussion and Potential Applications

The main reason the presented system has been preferred is, the robot not only requests to give or make accessible but also proactively provides a human-adapted solution to the user through different means. The same framework could be used in the case where instead of the robot, the human will announce the task that he/she is about to give or make an object accessible to the robot. In that case, the robot could proactively find a solution and communicate it with similar behaviors, such as: “OK I will take …” and reaches out to take or “OK put it on the box, I can take it from there …”.

There could be a range of real-world day-to-day interaction situations in which the robot could not reach an object at all, or need to put significant effort and/or time to get the object by itself. In such situations, the robot could request for the help of the human partner. In that case, the presented system could be used to improve the sociability and acceptance of the robot, allowing the robot to do more than just asking for the help, to cooperate proactively. One such example scenario could be a domestic robot cleaning the dinner table. To perform the task efficiently (e.g. to minimize the time), a high-level planner such as [1] could plan some cooperative actions. Such as, the robot could ask the human to give or make accessible a far-reaching object, so that the robot will not be required to take a tour around the table. However, to be polite and supportive, the robot could proactively stretch out its hand while asking for the human’s help. Another example could be to support an elderly person or someone who is having back or neck problem or reduced mobility. The presented framework could take into account such constraints by appropriately restricting the maximum allowed effort level of the human partner, e.g. restricting it to Arm_Torso_Effort and then could find a solution to behave in a proactive manner. Suppose the robot has served a glass of water to take the medicines and now the person is about to finish taking the medicines. Whether it is to take the glass back from the human or to suggest a place to put it so that later on the robot could take it, the presented system could be used to reduce the person’s effort and confusion. Similarly if a robot teammate is fixing a frame in an assembly unit therefore cannot move away, it can ask for a tool from the human partner, while showing proactive reach out behavior with its maximum feasible effort level in that situation.

Moreover, it is not necessary that the robot will always be expected to reduce the effort of the human partner. The presented framework can be adapted to find the solutions, which could balance mutual-effort or could even reduce the robot’s effort. This could be achieved by regulating the effort levels of both the agents in the iterations of the presented framework.

In [42], it has been shown that the robot could express and communicate its intention of type “I want to do this but I can’t …” to the human by repeating the simple actions corresponding to the failing task. The same framework could be used to find a solution for such communicative actions. For example, the robot could communicate its intention that it wants to take an object but could not reach by trying to reach out in a human-adapted manner.

The presented framework, which enables to estimate ‘where’ the human can perform the tasks with different levels of effort, could also be used to estimate the places to provide a solution for other proactive behaviors in basic human-robot interactive tasks. We are in the process of hypothesizing and realizing such proactive object manipulation behaviors:

  1. (a)

    Proactively put an object closer to the human, for example, putting the sugar container closer to the human preparing the coffee, to reduce his effort.

  2. (b)

    Proactively put away the object, which the human could hit obliviously during performing some task.

  3. (c)

    Proactively take an object away to make space for the human to put another object, to facilitate achieving a task.

  4. (d)

    Proactively take and show some object, which the human might be looking for, etc.

The presented framework could be also adapted to behave proactively by estimating where the human could exhibit a competitive behavior. Figure 20 shows the candidate places where the human with Arm_Torso_Effort could hide some object from the robot. In the context of multi-agent object hide-and-seek like games, the robot could use such information to proactively communicate or hint its partner about the potential places to hide or seek some object. In fact, various types of higher level proactive behaviors could be realized by exploiting the fact, ‘where’ an agent could perform a task, either to cooperate or to compete, with a particular effort level.

Fig. 20
figure 20

The robot’s estimation about the places where the human with Arm_Torso_Effort might be able to hide something from it

8 Conclusion and Future Work

The contribution of this paper is two-fold: Hypothesizing and validating proactive behaviors for two basic human-robot interactive manipulation tasks and a generalized planner, which could find a solution for such tasks while respecting environmental, postural and effort oriented constraints. The focus is to utilize the ‘where’ information based on multi-state visuo-spatial perspective taking. Our research is inspired by studies of human behavioral psychology, which suggests that estimation of ‘where’ is an important aspect for co-ordination.

We have shown through users’ responses that our hypotheses hold and the users find the robot to be more ‘aware’ and ‘supportive’. Further, through experimental results and the users’ responses we have shown that the presented proactive planner is not only finding the solution for basic tasks for different robots in different scenarios, but also successfully achieving its goal of reducing the human’s ‘effort’ and ‘confusion’. In the presented framework, our robots observe the human, analyze situation-dependent capabilities, perform various task-dependent feasibility tests for motion planning and find feasible solutions for different proactive behaviors. Assignments of various weights in finding the feasible solution or in various decision-making processes are parameters to the system and could be adapted/refined even online based on user activity, response and feedback.

As the robot could perform the presented multi-state perspective taking for multiple humans/agents, an interesting future work is to synthesize and instantiate proactive behaviors by taking into account more than two agents’ cooperation for a task. Another interesting work in progress is to use the information learnt/understood by the robot about what does the task ‘mean’ and ‘how’ a task could be performed [9, 44, 45]. This will help to explicitly provide proactive suggestions about ‘how’ to perform a task as well as to autonomously synthesize a proactive behavior/action. It would be also interesting to analyze the effect: would the user feel too constrained by the robot or would it better help the user to perform the cooperative task? What should be the optimal level of abstraction of such suggestions?

The hypothesized proactive behaviors in this paper are some of the essential building blocks of basic actions for complex socio-cognitive behavior including expectation [39] and intention. We have performed a preliminary level of user studies and found the results encouraging and supporting our intuition and hypotheses. We feel the need of further user studies from the perspective of long-term human-robot interaction in the context of high-level tasks. Regarding this, proactive gaze has been suggested as an important aspect to be incorporated in developing methods to measure HRI through motor resonance [54]. This could be adapted to develop the measure of proactivity in HRI, based on how much the proactive action of the robot induces proactive gaze of the human partner, indicating the predictiveness in the proactive behavior. This will also help in identifying the necessary enhancements at different levels of planning and execution of such proactive behaviors.