Keywords

1 Introduction

An essential capability of social mobile robots is to perform face-to-face interactions with a person initially standing away from the robot. In addition to the content of the interaction, it is also crucial for the robot near the human to move in a way that does not make him/her feel uncomfortable. While the effect of relatively simple parameters such as the distance between agents [1,2,3] and approach direction [4, 5] have been investigated extensively, more subtle differences such as trajectory shape has been less explored [6,7,8].

In addition to the shape, human perception also depends on the social norm of a particular culture, which is difficult to program manually. Therefore, a possible method for improving social acceptance is to model observed human trajectories by using techniques such as inverse optimal control (IOC) [9, 10] and inverse reinforcement learning (IRL) [11, 12] where the weights of the cost or reward function terms are determined such that outputs of optimization or policy become similar to human trajectories. Furthermore, human studies are needed to confirm that generated trajectories are indeed socially acceptable because appearance and mode of transport of mobile robots are different from humans.

In this paper, we demonstrate through a user study that mobile robot trajectories generated from human-to-human approach trajectories are perceived more socially acceptable in a face-to-face interaction scenario than those from human point-to-point trajectories. We first apply an IOC-based method [9] to human-to-human trajectories. However, it turns out that the original cost function cannot reproduce observed trajectories. We therefore introduce two new terms to the cost function: one inspired by the social force model [13] and another that reduces the centrifugal force. The resulting trajectories are not only closer to human-to-human trajectories, but also perceived significantly more preferred than those with the original cost function [9].

Our results can be used for designing the cost or reward function for other optimization- or learning-based approaches. Furthermore, since we use a large mobile robot that can potentially be perceived dangerous, it is likely that similar or even better results will be obtained with smaller robots. The results are also independent of the content of the interaction because the robot does not perform active interaction with the human in our experiments.

2 Inverse Optimal Control [9]

We represent the dynamics of a mobile robot by a differential equation:

$$\begin{aligned} \dot{\textbf{x}}(t) = \textbf{f}\left( \textbf{x}(t), \textbf{u}(t) \right) \end{aligned}$$
(1)

where \(\textbf{x}(t)\) is the current state and \(\textbf{u}(t)\) is the input.

Using the differential drive model with a velocity component orthogonal to the direction (Fig. 1), \(\textbf{x}\) and \(\textbf{u}\) are represented as

$$\begin{aligned} \textbf{x}(t)= & {} \left( x(t)\; y(t)\; \theta (t)\; v_f(t)\; v_o(t)\; \omega (t) \right) ^T \end{aligned}$$
(2)
$$\begin{aligned} \textbf{u}(t)= & {} \left( u_1(t)\; u_2(t)\; u_3(t) \right) ^T. \end{aligned}$$
(3)

Choosing the inputs as \(u_1(t) = \dot{v}_f(t)\), \(u_2(t) = \dot{v}_o(t)\) and \(u_3(t) = \dot{\omega }(t)\) yields

$$\begin{aligned} \textbf{f}\left( \textbf{x}(t), \textbf{u}(t)\right)= & {} \left( \dot{x}(t)\; \dot{y}(t)\; \omega (t)\; u_1(t)\; u_2(t)\; u_3(t) \right) ^T,\end{aligned}$$
(4)
$$\begin{aligned} \dot{x}(t)= & {} v_f(t) \cos \theta (t) - v_o(t) \sin \theta (t),\end{aligned}$$
(5)
$$\begin{aligned} \dot{y}(t)= & {} v_f(t) \sin \theta (t) + v_o(t) \cos \theta (t). \end{aligned}$$
(6)

We impose limits on the velocities and inputs:

$$\begin{aligned} 0 \le v_f(t) \le v_{fmax} \end{aligned}$$
(7)
$$\begin{aligned} -v_{omax} \le v_o(t) \le v_{omax} \end{aligned}$$
(8)
$$\begin{aligned} -\omega _{max} \le \omega (t) \le \omega _{max} \end{aligned}$$
(9)
$$\begin{aligned} -u_{imax} \le u_i(t) \le u_{imax}\;(i=1,2,3). \end{aligned}$$
(10)
Fig. 1.
figure 1

Mobile robot model

IOC involves two optimizations: the inner optimization for obtaining the optimal trajectory that minimizes a given cost function, and outer optimization for obtaining the optimal cost function. In practice, the outer optimization gives the optimal weights of the predefined cost function terms.

[9] defines the cost function for inner

optimization as

$$\begin{aligned}&\quad J(x_e, y_e, \theta _e, \textbf{w}, T, \textbf{x}(t), \textbf{u}(t)) = \nonumber \\&T + w_1\int _0^T u_1^2 dt + w_2 \int _0^T u_2^2 dt + w_3 \int _0^T u_3^2 dt + w_4 \int _0^T \left( \arctan \left( \frac{y_e-y}{x_e-x} \right) - \theta \right) ^2dt \end{aligned}$$
(11)

where \(\left[ x_e, y_e, \theta _e\right] \) is the goal pose represented in the robot’s local frame at its initial pose, \(\textbf{w} = [w_1, w_2, w_3, w_4]\) are weights that are kept constant during inner optimization, and T is the duration of the trajectory. The last term of (11) encourages the robot to face the goal.

We represent the trajectory by \(N+1\) discrete states \(\textbf{x}_k\; (k=0,1,\ldots ,N)\) and N inputs \(\textbf{u}_k\; (k=0,1,\ldots ,N-1)\). With \(\textbf{X} = \left[ \textbf{x}_0^T\; \textbf{x}_1^T\; \ldots \; \textbf{x}_N^T\right] ^T\) and \(\textbf{U} = \left[ \textbf{u}_0^T\; \textbf{u}_1^T\; \ldots \; \textbf{u}_{N-1}^T\right] ^T\), inner optimization is formulated as

$$\begin{aligned} T^*, \textbf{X}^*, \textbf{U}^* = \arg \min J(x_e, y_e, \theta _e, \textbf{w}) \end{aligned}$$
(12)

with inequality constraints (7)–(10) and equality constraints

$$\begin{aligned} \textbf{x}_0= & {} \left[ 0\; 0\; 0\; 0\; 0\; 0\right] ^T\\ \textbf{x}_N= & {} \left[ x_e\; y_e\; \theta _e\; 0\; 0\; 0\right] ^T\\ \textbf{x}_{k+1}= & {} \textbf{x}_k + \frac{T}{N} \dot{\textbf{x}}_k\\ \textbf{x}_k= & {} \left[ \begin{array}{cccccc} x[k]&y[k]&\theta [k]&v_f[k]&v_o[k]&\omega [k] \end{array} \right] ^T\\ \dot{\textbf{x}}_k= & {} \left[ \begin{array}{c} v_f[k]\cos \theta [k] - v_o[k]\sin \theta [k]\\ v_f[k]\sin \theta [k] + v_o[k]\cos \theta [k]\\ \omega [k]\\ \textbf{u}[k] \end{array} \right] . \end{aligned}$$

The outer optimization obtains the weights \(\textbf{w}\) such that the trajectories computed by inner optimization are similar to observed trajectories. Assume that we have M observations with different goal poses \([x_{em}, y_{em}, \theta _{em}]\;(m=1,2,\ldots ,M)\) and denote the observed poses in the m-th observation by \([\hat{x}_{mk}, \hat{y}_{mk}, \hat{\theta }_{mk}]\;(k=0,1,\ldots ,N)\). The error from the m-th observed trajectory is evaluated by

$$\begin{aligned} Z_m(\textbf{w}) = \sum _{k=0}^N (\hat{x}_{mk} - x_{m,\textbf{w}}^*[k])^2 + (\hat{y}_{mk} - y_{m,\textbf{w}}^*[k])^2 + c_{\theta }(\hat{\theta }_{mk} - \theta _{m,\textbf{w}}^*[k])^2 \end{aligned}$$
(13)

where \(c_{\theta }\) is a user-defined constant and \(x_{m,\textbf{w}}^*\), \(y_{m,\textbf{w}}^*\) and \(\theta _{m,\textbf{w}}^*\) are the solution of inner optimization with \(\textbf{w}\) as the weights and \([x_{em}, y_{em}, \theta _{em}]\) as the goal. Using (13), outer optimization is formally defined as

$$\begin{aligned} \textbf{w}^* = \arg \min _{\textbf{w}} \bar{Z}(\textbf{w}),\; \bar{Z}(\textbf{w}) = \frac{1}{M}\sum _{m=1}^{M} Z_m(\textbf{w}) \end{aligned}$$
(14)

subject to \(\textbf{0} \le \textbf{w} \le \textbf{w}_{max}\), where \(\textbf{w}_{max}\) is the vector of maximum weights. Let us also define \(\bar{Z}^* = \bar{Z}(\textbf{w}^*)\).

Fig. 2.
figure 2

Initial human and mannequin poses for human-to-human trajectory collection

3 Application to Human Approach

To emulate the human approach scenario, we place a mannequin in a motion capture area and have a human participant approach and eventually stand in front of the mannequin. As shown in Fig. 2, we place the mannequin at 2 different positions facing 8 different directions each. The human participant also starts from 3 different facing directions. The total number of observed trajectories is thus 48, and we use 10 of them for computing the optimal \(\textbf{w}\) (training) and 38 for testing.

The final pose of the human participant is determined from the mannequin’s pose \([x_m, y_m, \theta _m]\) as

$$\begin{aligned} \left[ \begin{array}{ccc} x_e,&y_e,&\theta _e \end{array} \right] = \left[ \begin{array}{ccc} x_m - d\cos \theta _m,&y_m - d\sin \theta _m,&-\theta _m \end{array} \right] \end{aligned}$$
(15)

where d is a constant distance between the final positions of the human and mannequin. We choose \(d=0.8\) based on the observed trajectories.

3.1 Curriculum Inverse Optimal Control

Since outer optimization is nonlinear, we expect that the problem has a number of local minima. However, it is unrealistic to exhausively search in the 4-dimensional parameter space for the global optimum.

We therefore optimize the parameters in multiple stages by gradually introducing new terms to the cost function for inner optimization. The first stage uses a cost function consisting only of the input terms:

$$\begin{aligned} J_1(w_2, w_3, \textbf{u}(t)) = \int _0^{\hat{T}} u_1^2 dt + w_2 \int _0^{\hat{T}} u_2^2 dt + w_3 \int _0^{\hat{T}} u_3^2 dt \end{aligned}$$
(16)

where the fixed duration \(\hat{T}\) is computed by

$$\begin{aligned} \hat{T} = 1.5 \max \{\sqrt{x_e^2 + y_e^2}/v_{fmax},\; |\theta _e|/\omega _{max} \}. \end{aligned}$$
(17)

Note that we have set \(w_1\) to 1 because scaling \(J_1\) by a constant does not affect the result. In the first stage, we uniformly sample the initial values for \(w_2\) and \(w_3\) and perform outer optimization for each of them. We then use the \((w_2, w_3)\) that give the smallest \(Z_m(\textbf{w})\) as the initial guess for the second stage, which includes the duration term:

$$\begin{aligned} J_2(w_0, w_2, w_3, T, \textbf{u}(t)) = J_1(*) + w_0 T \end{aligned}$$
(18)

where \(w_0\) is the weight for the duration. We uniformly sample the initial values for \(w_0\) and optimize \(w_0\), \(w_2\) and \(w_3\) by outer optimization. Finally, the best weights are used as the initial guess for the final stage, which uses a slightly modified version of Eq. (11):

$$\begin{aligned} J_3(x_e, y_e, \theta _e, w_0, w_2, w_3, w_4,&T, \textbf{x}(t), \textbf{u}(t)) \nonumber \\&= J_2(*) + w_4 \int _0^T \left( \arctan \left( \frac{y_e-y}{x_e-x} \right) - \theta \right) ^2dt. \end{aligned}$$
(19)
Table 1. Weights obtained by IOC [9] and their errors

Table 1 summarizes the weights obtained by IOC and their errors. As the results indicate, adding the direction to the cost function does not improve the prediction accuracy. This is because the human tends to take a circular path around the mannequin especially when approaching from behind, and therefore facing the goal position is not important near the mannequin.

Fig. 3.
figure 3

Optimized motion vs. ground truth. Left: from training dataset, right: from test dataset. Magenta: using (19) (facing direction), blue: using (20) (social force). The unit of the axes is meters. (Color figure online)

3.2 New Cost Function

Motivated by the initial results, we replace the last term of (19) with a new term, resulting in a new cost function:

$$\begin{aligned} J'_3(x_m, y_m, w_0, w_2, w_3, w'_4, T,&\textbf{x}(t), \textbf{u}(t)) = \nonumber \\&J_2(*)+w'_4 \int _0^T \frac{1}{\epsilon + (x_m - x)^2 + (y_m - y)^2} \end{aligned}$$
(20)

where \(\epsilon \) is a small constant. The new term produces an effect similar to social force [13] by increasing the cost near the mannequin.

We then perform stage 3’, in which \(w_0\), \(w_2\), \(w_3\) and \(w'_4\) are optimized using the optimal weights of stage 2 and uniformly sampled \(w'_4\) as initial values. As shown in Table 2, stage 3’ improves the accuracy by 36% for the training data set and 8% for the test data set. Figure 3 compares the trajectories optimized using Eqs. (19) and (20) for trajectories used for training and testing.

Table 2. Optimized weights and resulting errors with the new cost function terms

In some cases, we observe that the optimized trajectory shows excessive detouring compared to the ground truth, which may be a side effect of the new term. To reduce this effect, we add another term that tends to reduce the centrifugal force. The cost function is now

$$\begin{aligned} J_4(x_m, y_m, w_0, w_2, w_3, w'_4, w_5, T, \textbf{x}(t), \textbf{u}(t)) = J'_3(*) + w_5\int _0^T v_f^2 \omega ^2 dt. \end{aligned}$$
(21)

Optimizing all weights (stage 4) results in the last row of Table 2. The centrifugal force term achieves a modest improvement of approximately 3% in both training and test data sets.

We use the IPOPT solver together with the Python interface of CasADi [14]. Trajectory optimization using (19) or (21) typically takes 2–3 s.

4 User Study

4.1 Experimental Platform

Our experiment involves a humanoid robot with a mobile base [15] (dimension: 96\(\times \)80\(\times \)170 cm, weight: 160 kg) moving in an area of about \(7.5\times 4\) (m) (Fig. 4). The mobile base is omnidirectional and its command velocity is computed by proportional control with a gain of 2.0 and feedforward velocity. The command velocity is capped at 0.8 m/s for translation and 0.8 rad/s for yaw rotation.

Fig. 4.
figure 4

Top view of the experiment setup

We consider a full face-to-face interaction scenario where the robot approaches the human, stays in place for a few seconds, and returns to the original location. The same cost function is used for both approaching and returning trajectories. The robot does not make any gesture or sound throughout the motion so that we can measure the effect of the trajectory shape exclusively.

We vary two parameters of the trajectory: direction from which the robot approaches (front, side, or back) and moving speed (slow or fast). The direction is controlled by having the participants stand facing the directions shown in Fig. 4. The slow motions are generated by increasing the duration of the optimized trajectories by 30%. For each of the 6 combinations of a direction and moving speed, a pair of trajectories are generated by the baseline [9] and proposed methods. The baseline method uses (19) for human point-to-point trajectories collected by the authors in a setting similar to Fig. 2 but without the mannequin, while the proposed method uses (21) for human-to-human trajectories. The order between the two trajectories is randomized and balanced across all participants.

4.2 Experimental Protocol

The experimental protocol was reviewed and approved by the institutional review board (IRB) of Honda R &D Co., Ltd.

Fig. 5.
figure 5

Snapshots from a user study session. Left: baseline, right: proposed.

Table 3. Demographics of the participants

We use a motion capture system [16] with 11 cameras to measure the participant’s head pose and determine the robot’s goal pose at the beginning of each trial. We also continuously measure the robot’s position and orientation for trajectory tracking. Before a trial begins, the participant stands at the designated position facing one of the three directions as instructed. The operator gives the participant a warning that the trial is about to start and the participant hears an audible beep once the trial begins. At the beep, the robot begins to approach the participant from their left-hand side, stays in front of the participants for a few seconds, and returns to its original position. Due to the size of the robot base, we choose 1 m as the distance between the participant and the front end of the robot when it is staying in front of the human. While the robot is in motion, the participant is allowed to turn the head and move their position and orientation by, for example, stepping back or turning when they feel the robot may come uncomfortably close. After the first trial, the second trial is conducted with the same direction and speed but with the trajectory generated by the other method. After both trials are completed, the participant is asked to fill a questionnaire that asks the participants to choose which of the pair of trajectories better fits the following descriptions:

  • The motion is safer.

  • The motion is more polite.

  • The motion is more comforting.

  • The motion is more aggressive.

  • The motion is more awkward.

  • The robot is more competent.

  • The robot is more reliable.

In addition to participants’ response to the questionnaire, their reactions to the robot motion such as stepping back or turning to keep a comfortable distance were also recorded by the experimenter.

5 Results

Table 4. Aggregated distribution of votes
Table 5. Number of occurrences of stepping back and turning
Fig. 6.
figure 6

Votes of trajectories split by each direction and speed

We collected valid responses from 18 participants. Figure 5 shows snapshots of the robot motions from the same direction (back) using the baseline and proposed methods. Table 3 summarizes the demographics of the participants as well as the ratio of those who had prior experience in interacting with robots. Since each participant experienced 6 pairs of trajectories, each question received \(18 \times 6 = 108\) total votes. The aggregated distribution of the votes for each question is shown in Table 4. The result of proportion test with \(p = 0.5\) shows that the expected votes are not uniform between two trajectories with statistical significance. More participants prefer the trajectory generated by the proposed method because they feel the motion is safer, more polite and comforting, but less aggressive and awkward. Furthermore, significantly more participants consider the robot as more competent and reliable when its trajectory is generated by the proposed method.

Similar trend can be observed in the distributions of votes for each combination of direction and moving speed as shown in Fig. 6: more people prefer the trajectories generated by the proposed method in all aspects.

We apply log-linear analysis to investigate the association and interaction patterns among approaching direction, moving speed, and participants’ preference of trajectory. The results show that participants’ preference of trajectory (generated by baseline or proposed method) is not affected by moving speed or approach direction (all p-values are above 0.1), which is interesting because speed is also likely to affect the perception.

The number of occurrences of stepping back and turning is summarized in Table 5. Participants have a significantly higher number occurrences when the robot approaches with the baseline trajectory (\({\chi }^2 =13.55\), df=1, \(p<0.01\)). Among the three approaching directions, approaching from the back has the most significant difference in stepping back rate between the two trajectories. This result contradicts with the questionnaire responses that do not show significant effect from the approach direction. Such discrepancy may be explained by the difference between involuntary reaction and consciously responding to questions.

6 Conclusion

In this paper, we first presented an IOC approach for generating socially-aware trajectories of mobile robots to/from a human for face-to-face interactions. The main difference from a similar prior work [9] is that we proposed new cost function terms, one of which emulates the effect of social force [13] and the other reduces the centrifugal force. To solve the complex nonlinear optimization problem in IOC, we employed curriculum inverse optimal control where the results of simpler optimization problems with fewer parameters are used as the initial guess of the more complex problem. Using the data from human-to-human trajectories, we demonstrated that the proposed cost function results in more accurate reproduction of the observed trajectories.

The second half of the paper described the results of the user study to compare the social acceptance of the trajectories generated by the baseline [9] and proposed methods. The results showed that the trajectories generated by the proposed method are preferred over those by the baseline. We also demonstrated that the same cost function can be used for both approach and return trajectories, indicating the generality of the IOC-based method.

There are a few avenues for future work. While we focused only on the robot trajectory, social acceptance is affected by various factors such as appearance and joint motions. Trajectory optimization may be computationally expensive when the robot has to navigate through obstacles or other humans while approaching a human. In this case, IRL may be a better option because the robot can simply run the learned policy while in motion. However, our results can provide a guidance on how to choose the reward function terms to better replicate human trajectories around another human.