1 Introduction

People feel safe and comfortable within their own territory they keep from others. We should be respectful of other people’s territory and learn to adapt to such territory when interacting with others. Therefore, the interpersonal distance should be adaptively estimated to foster a better interaction through real-time responses from others, allowing one to modify their position not to trespass on others’ private areas. In the near future, domestic robots are expected to share the environment with humans and their perceptual and behavioral abilities must conform to our social norms. Therefore, domestic robots should be able to learn the proper social interaction distance and private area. However, it is difficult for the robot to estimate the social interaction distance of each person which may vary due to various social factors such as their culture, personal traits, and acquaintanceship. Although various researches have been conducted on the social model for mobile robot navigation [9, 19, 20], little attention has been paid to the dynamics of human social factors.

For mobile robot navigation in a human populated environment, collision avoidance is one of the most important concerns. Another important issue that needs increased attention is how to enable the robot to generate socially competent navigation behaviors, which should help people feel safe and comfort. These are important key challenges for human–robot symbiosis. The theory of Proxemics [2] and its related psychological concepts are frequently used for developing socially competent robot behaviors. This concept is integrated into various research endeavors, especially safe navigation considering social effects [3, 25, 26]. However, it is still a challenging problem to formalize this social science theory into a mathematic model for human-centered robot navigation.

Considering individuals’ social factors, our goal is to propose a dynamic social force model of human–robot social interaction. This enables the robot to adaptively estimate the human social interaction distance, especially their private area, in a public environment. This paper proposes a personalized social interaction model designed by a fuzzy inference system whose parameters are adjusted and optimized by a reinforcement learning method in an on-line manner. The estimated social force model is used as a cost map for the path planner to generate robot navigation paths to make people feel comfortable.

2 Related Work

In this section, we summarize the existing research related to interpersonal distance to mediate people’s interaction with others. First, we reference social science studies to give the definition of privacy and Proxemics. Then, we describe some studies based on the Proxemics theory to model human interaction areas and its application. Finally, we identify the technical challenges of modeling human interaction areas responding to individuals, social factors.

2.1 Privacy and Proxemics in Social Science

The key idea to formalize human–robot interaction is to understand and accommodate human behavior. Therefore, the knowledge of social science is of importance. First of all, Privacy was defined in human–robot interaction by Ruben and Smart [24]. They summarized that privacy is the ability of an individual or group to separate themselves and thereby express themselves selectively. The boundaries and content of what is considered private differ among cultures and individuals. Westin [31] mentioned that most of the animals seek privacy either as individuals or in the small groups. From this concept, we can get the idea of territoriality which is the defense of one area against intrusion by others. In his study, he reported three types of spacing observed among animals: personal distance between individuals, social distance between groups, and fight distance at which an intruder causes conflicts. At the same time, animals often gather in large groups. They seem to live in a tension between privacy and sociality. Zeeger studied human privacy in childhood [32], and found that 58 of 100 three-, four- and five-year-olds said they had a special place at the daycare center that belongs only to them. Newell found that adults usually seek privacy when they feel sad or tired, or need to concentrate [18]. These studies are mostly related to the theory of Proxemics [2] which describes different interpersonal distances that people keep from others. These distances depend on the type of interaction and relationship between individuals. Human interaction areas could be defined by this theory as shown in Fig. 1a. Among the various types of human interaction area, Public area is the area often used to interact with strangers, Social area is to interact with acquaintances, Personal area is used for familiar people, and Intimate area is for intimate contacts. On the other hand, people also use the interpersonal space concept to approach to others person. For example, when we try get closer to the closed friend to get more quality of interaction but keep the distance for stranger to make the person more comfortable. On top of this, protecting one’s privacy is an essential prerequisite for forming long-term, stable relationships, and developing socially competent robots. The safety reason is one of the criteria that results in the comfortable feeling to interact with the robot [24]. Therefore, the robot should consider human’s private space to maintain the comfortable feeling and quality of interaction. Empirical research claims that spatial privacy rights are important to determine whether to accept the interaction with robots [8, 16, 29].

2.2 Social Science in Human–Robot Interaction

The private space of human can be grouped into geometric and potential field models [11]. The models are designed based on four different shapes i.e., concentric circle, egg shape, concentric ellipses or asymmetric shapes, which used to describe the personal space of the human [23]. The private space or personal space can model by the geometric functions, for example, ellipse or semi-ellipse function. These geometric models have crisp boundaries. Thus, they are appropriated to express sharp transitions between personal space and other free space. This group of models are suited for local path planning and obstacle avoidance. The examples of this group of modelling can be found in [10, 17, 19, 28]. However, the sharp transitions between spaces cause the robot movement when it operates in population-environment because the robot avoids intruding into the personal are.

Another group of models describes the personal space of the human with the potential field method. This group of models composed of the continuous functions assigning values to location around the human. This group of personal space models reflect the idea that human comfort is getting worse when an intruder approaches closer to humans. The example of this group of modelling can be found in [3, 7, 9, 20, 25, 26]. This group of personal space models are suited for the optimal path planning frameworks which would like to optimal path cost that comes from human’s response.

Human social factors are incorporated into a high-level representation. Human’s pose, speech, and gesture cues are often used to evaluate social interaction area to guide a robot in a socially compliant manner [15]. For example, Butler and Agah studied what type of approach behaviors make humans uncomfortable [1]. In [27], they investigated human traits influencing proxemic behaviors. These works proposed methods to design robot behaviors not to violate people’s privacy. The social relationship and genders were used as the social factors to generate the social interaction area and robot collision avoidance paths in human environments [21]. Several robot behaviors have already been implemented with the private space in mind, such as, standing in line [17], following a person [4], and passing a person in a hall [12].

Referring to the above literature, the actual size of interaction area at any given instance varies depending on social factors of people and on the task being performed. Therefore, adaptive space of human–robot interaction was proposed to deal with uncertainties of robot perception [5]. The method was based on the non-stationary model as skew-normal probability density functions, allowing smooth adaptation in situation awareness of a robot within the common human–robot interaction. Luber and Spinello addressed the problem of social-aware navigation among humans that meet the objective criteria such as travel time or path length as well as subjective criteria like human comfort feeling [13]. The method adapts the social interaction area based on learning from a set of dynamic motions observed in a public hall. In [22], the authors performed computer simulations that the robot should be able to prevent itself from intruding onto the human private area, but place itself in a location allowing social interaction, maximizing the degree of visiting the acceptable area and minimizing the degree of trespassing on the private area.

To recapitulate, a major weakness of previous works is a lack of adaptability in social interaction without considering individuals’ characteristics. In contrast, our approach enables the robot to learn to estimate the human private area during the interaction. The robot can learn parameters to update the private area through the human feedback. This social model can be integrated into a path planner to simultaneously ensure the human safety as well as the quality of interaction without intruding onto the private area (Fig. 1b). As the sizes of the quality interaction area and the private area vary from person to person, this work proposes a reinforcement learning based path planning approach for social robots capable of navigating outside the private area at all times.

Fig. 1
figure 1

Human interaction area: a human interaction area according to proxemics [2]. b Our proposed interaction area considering the quality of interaction and human privacy

Fig. 2
figure 2

Overall process: Three main parts: (1) human social model designed by a fuzzy inference system (FIS), (2) reinforcement learning to update human social model by optimizing the parameters of the FIS, and (3) social path planner to generate socially competent navigation using social model

3 Personalized Social Interaction

3.1 Overall Process

We propose a novel method to navigate the robot capable of generating a socially competent path considering the human state as shown in Fig. 2. There are three main parts in the proposed method: (1) Human social model designed by an Asymmetric Gaussian function which its parameters are determine from a fuzzy inference system (FIS), (2) Reinforcement learning which used as a tool to update the parameters of the FIS, and (3) Social path planner to generate socially competent navigation using the human social model. During the human–robot interaction, the robot detects the human state and social factors, such as the social relationship between humans and the robot, to preliminary design human’s private area. These social factors are the crisp set of input data which gathered for the fuzzy inference system. These crisp set are converted to a fuzzy set using fuzzy linguistic variables, fuzzy terms, and membership functions. Afterward, an inference is based on a set of fuzzy rules. Lastly, the resulting fuzzy output is mapped to a crisp output using the output membership function, in the defuzziffier step. The output from the fuzzy inference system is the parameters to calculate the model of privacy area of the human which can be calculated by the Gaussian function. Based on preliminary human’s private area, the robot can estimate the social map that includes people’s private area and use it to generate its navigation paths to perform social interactions. However, with the preliminary estimate social map, the robot receives the reward which is the combination of interaction degree and unacceptable degree, and use it for update the parameters of input membership function by learning mechanism (R-Learning). The robot continues to navigate around humans based on the new estimate social map. Finally, the robot will navigate through the paths that generate based on the estimated social map to perform social interactions within the quality interaction area, while not intruding into the private area (Fig. 1b).

3.2 Human Social Model

The social factor describes the social cues of people such as their relationship with other people, personality traits, culture, and emotional states. Use of such information is important to ensure people’s privacy as well as their safety in social robot navigation planning. This section will summarize the mathematic model of our fuzzy social relationship [21]. Our proposed human’s social model is designed according to two concepts. First is a concept of asymmetric shape personal space [23] which describes the personal or private space of the human with the different size of the frontal area and lateral area. Second is the degree of surrounding environment which can be used as the cost for path planning algorithm. Our proposed method considers the discomfort feeling from humans which has the maximum value at the human location, and decrease at the location far away from the human position. Therefore, the asymmetric Gaussian function which is the simple mathematics function, is suit to the model asymmetric shape of personal space and possible to provide the degree of the surrounding environment.

3.2.1 Fuzzy Social Relationship Model

The human state and the social factor (e.g., relative positions between the robot and each person, social relationship between them, genders of each person, etc.) can be used to design the private area each person wants to secure and keep from others. The private area can be represented by a set of positions (xy) surrounding each person to which force values are assigned as follows:

$$\begin{aligned} F\left( x,y\right) = \sum \limits _{i = 1}^n {{f_i}\left( x,y\right) } \end{aligned}$$
(1)

where n is the total number of persons, \(f_i\) is the repulsive force originating from the ith person which can be expressed by the bivariate Gaussian distribution function. Let A be the magnitude of the repulsive force which can be determined by a person’s physique. Also let \(\beta _{fr}\) and \(\beta _{si}\) be the size of the private area in the frontal and lateral directions, respectively, with respect to the ith person, as shown in Fig. 3. The repulsive force generating from the ith person \(f_i\left( x,y\right) \) is designed by

$$\begin{aligned} {f_i \left( x,y\right) }= A*\exp \left( { - \left( {{\beta _{fr}} - {\beta _{si}}} \right) } \right) \end{aligned}$$
(2)

which presents the degree of discomfort of the i-th person. Its peak value is observed at his/her position which decreases as the distance from him/her increases. It is clear from Eq. 2 that the magnitude of the degree of discomfort depends not only on the amplitude A, but also on \(\beta _{fr}\) and \(\beta _{si}\). These terms can be updated by the human state and the social factors, respectively.

Fig. 3
figure 3

Human’s private area: the privacy area of the human can be determine by using two factor. Frontal side \(B_fr\), which depends on human’s motion, and Lateral side \(B_si\) which can be determined by social signals

Table 1 Designing the social interaction area using fuzzy rules

Let us assume that the robot is able to perceive the human state which consists of his/her position, velocity, and orientation with respect to the inertial coordinate frame denoted by \((x_i,y_i,\dot{x_i},\dot{y_i},\theta _{i})\). Let d be the distance between the i-th person’s position \((x_i,y_i)\) and any position (xy) in their surrounding environment. \(\theta _i\) is the orientation of the person’s facing direction vector. The magnitude of velocity v can be computed by

$$\begin{aligned} {v_i} = \sqrt{\dot{x}_i^2 + \dot{y}_i^2} \end{aligned}$$
(3)

Considering the motion of people, \(\beta _{fr}\) can be defined as follows:

$$\begin{aligned} {\beta _{fr}} = \left\{ {\begin{array}{*{20}{l}} \frac{{{{\left( {d*\cos \left( {\theta - {\theta _i}} \right) } \right) }^2}}}{{2*\sigma _{f0}^2}}&{}\quad \text {if} \,\ {\cos (\theta - {\theta _i}) \le 0}\\ \frac{{{{\left( {d*\cos \left( {\theta - {\theta _i}} \right) } \right) }^2}}}{2*\left( \sigma _{f0}/\left( {1 + {\gamma _f}{v_i}} \right) \right) ^2} &{} \quad {{ \mathrm {otherwise}}} \end{array}}\right. \end{aligned}$$
(4)

where \({\sigma _{f0}}\) is chosen according to the different interpersonal social distance defined in [2]. Here \(\gamma _f\) is the normalization term, and \(\theta \) is the orientation of the vector that represents the position of any point in the environment with respect to the inertial coordinate system. Therefore, the robot would pay more attention in front of people rather than behind of them.

This paper also reflects social factors of people in relation to the robot, e.g., the gender, the relative distance, and the relationship degree, to estimate the design parameters of the private area in the lateral direction \(\beta _{si}\). Since the social factors vary depending on various conditions, it is difficult to group them as a binary function. Therefore, a fuzzy logic approach is used to quantify these parameters [21].

Gender is one of social factors that should be considered to model the private area. The input MF of gender is defined as a binary function subject to male (M) and female (Fe) which is given by

$$\begin{aligned} \varGamma _1(g) = \left\{ {\begin{array}{*{20}{l}} {0,}&{}\quad {\text {if }g\text { is \textit{M}}}\\ {1,}&{}\quad {\text {if }g\text { is }{} \textit{Fe}} \end{array}} \right. \end{aligned}$$
(5)

where g is the gender input.

Our next social factor is the relative distance which can be divided into two sets such as near (Near) or far (Far). It is represented by a sigmoid function. Let \(r_r\) be the input of the relative distance, \(a_r\) the steepness of the distribution of relative distance, and \(c_r\) the inflection point. Then the MFs of the relative distance is given as follows:

$$\begin{aligned} \varGamma _2(r_r;a_r,c_r) = 1/\left( 1+exp\left( {-\,a_r*(r_r-c_r)}\right) \right) \end{aligned}$$
(6)

Likewise, the relationship degree describes the personal knowledge or experience with the robot which can be set by three Gaussian functions, familiar (Fam), acquaintance (Acq), and stranger (Str). Let \(r_i\) be the relationship degree that the robot perceives from people. Therefore, the relationship degree MFs are given as follows:

$$\begin{aligned} {\varGamma _3}\left( r_i\right) =\left\{ \begin{matrix} {\mathscr {N}}\left( \mu _{Fam},s_{Fam}^2 \right) &{}\quad \textit{if} \,\ \textit{Fam} \\ {\mathscr {N}}\left( \mu _{Acq},s_{Acq}^2 \right) &{}\quad \textit{if} \,\ \textit{Acq}\\ {\mathscr {N}}\left( \mu _{Str},s_{Str}^2 \right) &{}\quad \textit{if} \,\ \textit{Str} \end{matrix} \right. \end{aligned}$$
(7)

For the output of the fuzzy logic, there are several ranges in the human interaction area according to the theory of Proxemics [2]. The distance of human interpersonal space inspires us to estimate the private area of the human. Therefore, the concept of different parameters in determining the different social model for each person is chosen related to these interpersonal space concept. In [21], we separate the personal area into two group, far personal area (FPA) and near personal area (NPA). These interaction areas give the different standard deviations \(\sigma _{si}\). Therefore, four Gaussian functions are used to represent a change of standard deviation(\(\sigma _{si}\)) in each interaction area which is defined as

$$\begin{aligned} {\sigma _{si}}={\mathscr {N}}\left( \mu ,s^2\right) =\left\{ \begin{array}{ll} {\mathscr {N}}\left( \mu _{PA},s_{PA}^2 \right) &{}\quad \textit{if} \,\ \textit{PA} \\ {\mathscr {N}}\left( \mu _{SA},s_{SA}^2 \right) &{}\quad \textit{if} \,\ \textit{SA}\\ {\mathscr {N}}\left( \mu _{FPA},s_{FPA}^2 \right) &{}\quad \textit{if} \,\ \textit{FPA}\\ {\mathscr {N}}\left( \mu _{NPA},s_{NPA}^2 \right) &{}\quad \textit{if} \,\ \textit{NPA} \end{array} \right. \end{aligned}$$
(8)

Thus, a detailed description of the proposed fuzzy rule is shown in Table 1. Combining the above-mentioned social factors, \(\beta _{si}\) can be defined as follows:

$$\begin{aligned} {\beta _{si}} = \frac{{{{\left( {d*\sin \left( {\theta - {\theta _i}} \right) } \right) }^2}}}{{2*{\mathscr {N}}\left( \mu ,s^2\right) ^2}} \end{aligned}$$
(9)

This means that, to prevent the robot from intruding onto the human private area, the robot is required to delineate the dynamic boundary of interaction areas based on the human social factors.

3.2.2 Learning Fuzzy Social Model

In this paper, the reinforcement learning method is used to learn from human feedback how to spot and respect the private area varying from one person to another. We integrate a reinforcement learning algorithm into fuzzy MFs. The MF, as the agent, learns to improve the private area in an attempt to increase the total amount of reward through human feedback. The action is then selected by the behavior policy in order to adjust the MFs to effectively update the social force (i.e., cost) map and to make a minimum cost path in the environment. This process is repeated until a maximum reward is reached in an iterative way.

Specifically, the R-Learning algorithm is used as the learner. Many reinforcement learners have to abandon the discounted future reward. In this work, with the average reward setting, R-Learning neither discounts nor divides experience into distinct episodes with a finite return [14]. This is well-suited to the social cost map generation in order to sustain long-term interactions that should take every interaction experience into account equally.

The transition matrix depends on the action by an agent. In this paper, the state S consists of the parameters of each MF. We focus only on mean values \(\varvec{\mu }\) of MFs to be learned, therefore, the state will consist of three means of Familiar, Acquaintance and Stranger functions, \(\varvec{\mu }\) = [\(\mu _{Fam}, \mu _{Acq}, \mu _{Str}\)]. The action, \(a \subset A\), is how each MF can be adjusted. To select the action a, the \(\varepsilon \)-greedy method is used to select the action that has maximum estimated state-action value Q. Therefore, the value of state S with the action a can be defined as

$$\begin{aligned} Q(S,a) = Q(S,a) + \alpha [ {R} + \bar{R} + max_a Q(S',a) - Q(S,a)] \end{aligned}$$
(10)

where \(S'\) is the next state, \(\alpha \) is a constant learning rate, R is the reward signal to be gained from the environment, and \(\bar{R}\) is the average reward value. In the real robot experiment, the robot can receive the reward in real time in the form of interaction and unacceptable degrees, respectively, from each person’s emotion or feeling. The interaction degree (ID) presents the degree of interaction quality or the degree of easiness of interaction, while unacceptable degree (UD) implies the degree of discomfort during human–robot interaction. The ID and UD are increasing and decreasing respectively when the robot gets closer to the human. Both degrees depend on the distance between the human and the robot. Therefore, the reward can be defined as

$$\begin{aligned} R = \frac{k_1 * ID}{k_2 * UD + c} \end{aligned}$$
(11)

where \(k_1\) and \(k_2\) are the weights of each degree, and a constant c is used to prevent zero division. For simulation, ID and UD are collected from the generated path through the predefined ground truth social map. Therefore, the interaction and unacceptable degrees can be determined as

$$\begin{aligned} ID= & {} \left\{ \begin{matrix} \sum _{p}^{}\sum _{i=1}^{n}-f_i(p)+1, &{}\quad p \text { within distance limit}\\ 0,&{}\quad otherwise \end{matrix}\right. \nonumber \\\end{aligned}$$
(12)
$$\begin{aligned} UD= & {} \left\{ \begin{matrix} \sum _{p}^{}\sum _{i=1}^{n}f_i(p),&{}\quad p \text { within distance limit}\\ 0,&{}\quad otherwise \end{matrix}\right. \end{aligned}$$
(13)

where p is a set of navigation path coordinates in the predefined social cost map. Therefore, this MF can be learned by \(\varvec{\mu }\) to maximize the reward having a maximum value of ID and a minimum value of UD. The complete R-Learning algorithm is given in Algorithm 1.

figure a
Fig. 4
figure 4

Algorithm flow chart

3.3 Path Planner

We use Transition based Rapidly-Exploring Random Tree (T-RRT) that can choose an optimal navigation path in the social cost map and collect the reward [6]. T-RRT takes advantage of two approaches. First, the exploration strength of the RRT algorithm rapidly grows random trees toward unexplored areas. Secondly, the features of stochastic optimization methods apply transition tests to accept or to reject potential states. This planner produces the path that efficiently follows the low-cost area and the saddle point of the cost map. Therefore, we use T-RRT for the exploration and optimal path generation, allowing the robot to evaluate the navigation cost as the social map is updated. More specifically, we employ T-RRT to navigate the robot through the space that separates the private area and the low quality interaction area.

4 Results and Analysis

Fig. 5
figure 5

Social map: comparison of social maps. Ground truth social map (left) initial social map (middle) and estimated social map after the learning process (right)

Fig. 6
figure 6

Private boundary: comparison of private area boundaries. Ground truth boundary (blue solid line), estimated boundary at initial setting (green dash line), and estimated boundary after the learning process (red dash line). (Color figure online)

Fig. 7
figure 7

Fuzzy input MFs: comparison of parameters of social relationship model (fuzzy membership function). Ground truth values (top), Initial parameters of membership functions (middle), and trained parameters of membership functions (bottom)

Fig. 8
figure 8

Social map error: the error between the ground truth cost and estimated cost on the social map. Fixed-parameters (blue) estimates the same cost of social map, the error is maintain. For our proposed (red), at the beginning, the error is high due to the incorrect parameters. As the learning process proceeds with updated parameters, the error converges to zero (red). (Color figure online)

Fig. 9
figure 9

Interaction degree: interaction degree presents the acceptable degree that the robot can receive from people along generated paths. High interaction degree means that the robot approaches close enough to have quality interactions with people

Fig. 10
figure 10

Unacceptable degree: unacceptable degree presents the total discomfort feeling that robot receive from people along the generated path. The robot should plan the path without entering the human private area

This section shows simulation and real experiment results with a humanoid robot Pepper. Our goal is to enable the robot to plan paths to visit every person in the environment without trespassing on their private area, but to keep the distance from which people are able to have high quality interactions. Figure 4 shows the algorithmic process flowchart implemented in this paper. First, the robot explores the environment to generate a geometric map. It can then create a social map by computing and assigning the social cost to the geometric map. Using the social map, the robot can generate the path to visit any person in the environment. Specifically, a genetic algorithm is used to determine the order of visiting people. After that, T-RRT path planner generates the low-cost path following the order of visiting people. To update the social map, R-learning adjusts the MF parameters by receiving the reward while visiting people. The social map is being updated until the robot gains the maximum rewards which maximize the interaction degree and minimize the unacceptable degree evaluated by people. The simulation results show that our proposed method has the capability to adjust and update the social map to gain the maximum interaction degree and minimum unacceptable degree in various conditions. We also perform real robot experiments to show that our proposed method can navigate the robot to interact with people at the proper distance. The social factors of each person, i.e., the gender and relationship degree of people in relation to the robot, are given to the robot in both simulations and real robot experiments.

4.1 Simulation Results

In the simulation, we assume that a geometric map is given or created by the robot. Our proposed model is to generate the social map by computing and updating social cost assigned to the geometric map. This social map is used to plan the robot navigation path in the environment. To validate the proposed model, we need to receive the reward from people. Therefore, the concept of social relationship model in [21] is used to model the ground truth social map of people whose relationship degree MFs are set to three Gaussian functions as follows: \(s_{Fam}\) = 0.15, \(\mu _{Fam}\) = 0.1 to Fam set, \(s_{Acq}\) = 0.15, \(\mu _{Acq}\) = 0.3 to Acq and \(s_{Str}\) = 0.15, \(\mu _{Str}\) = 0.8 to Str set. The ground truth MFs are shown in Fig. 7 (Top).

To estimate the human private area, the initial parameters of the relationship degree MFs in Eq. (7) are designed as follows: \(s_{Fam} = 0.15\), \(\mu _{Fam} = 0\) to Fam set, \(s_{Acq} = 0.15\), \(\mu _{Acq} = 0.5\) to Acq set, and \(s_{Str} = 0.15\), \(\mu _{Str} = 1\) to Str set as shown in Fig. 7 (Middle). These parameters can be adjusted by the learning process. Likewise the relative distance MFs are designed as follows: \(a_{Near} = -\,0.35\), \(c_{Near} = 300\) to Near set and \(a_{Far} = 0.35\), \(c_{Far} = 300\) to Far set.

For the output function, the social interaction area is split into four Gaussian sets. The parameters of Eq. (8) are as follows: \(\mu _{PA} = 0.035\), \(s_{PA}= 0.005\), \(\mu _{SA} = 0.045\), \(s_{SA}= 0.005\), \(\mu _{FPA} = 0.0035\), \(s_{FPA}= 0.06\), \(\mu _{NPA} = 0.0035\), \(s_{NPA}=0.065\). These parameters are decided based on the human interaction area concept [16] which determined the range of an individual’s interpersonal space with different social factors when the robot approached the person. Reflecting their results, we can determine the parameters for the output membership functions.

For the reinforcement learning process, we set the discrete states which consist of three mean values of each relationship MF, i.e., \(\mu _{Fam}, \mu _{Acq}, \mu _{Str}\). The action set for each function is simply defined as stay, move right, or move left, i.e., 0, \(+\,0.1, -\,0.1\). The MFs can be adjusted through iterative learning processes until gaining a maximum reward signal.

The ground truth and estimation social map can be seen and compared in Fig. 5. The results show that the estimated social cost map with the initial setting (Middle) is different compared to the ground truth map (Left). With an initial setting, the robot estimated the private area unsuitably for the people, causing the robot to generate paths that decrease their comfortable feeling. The learning process enables the robot to adjust the system parameters and re-estimate the human private area incorporating the feedback from the human. Therefore, the estimated social map after the learning process (Right) becomes similar to the ground truth map and can be used to generate paths that make people feel comfortable. To make it clearer, Fig. 6 shows that the private area boundary of the initial setting (green dash-line) is smaller than the ground truth (blue line). However as the learning process proceeds, the estimated private boundary becomes similar to the ground truth (red dash-line). The relationship degree MFs after the learning process can be seen in Fig. 7 (Bottom). The results of our proposed model can be compare to the fixed-parameters model which use the same parameters to estimate the social map (Fig. 7). The errors of estimated social maps for three, four, and five people, respectively, compared to ground truth social maps, are shown in Fig. 8. The result shows that, while navigating the initial and updated social cost maps, the robot was able to learn and adjust the MFs through the reward obtained from people. Finally, the errors converged to a value near zero (red). However, for the fixed-parameters model (blue), the error of social map is constant which mean the estimate social map is not change and different to the ground truth.

In this paper, we define the quality interaction area and the private area. Figure 9 shows the interaction degree with three, four, and five subjects, respectively. The results show that our proposed method increases the interaction degree of subjects during their interaction with the robot until it suits everyone. Figure 10 shows the results of the unacceptable degree. The results show that our method can reduce the unacceptable degree of subjects until they feel comfortable to interact with the robot. These results show that our proposed model outperforms the fixed-parameter for estimated the privacy area and more clearly with the number of humans in the environment. The results can be summarized in Table 2. We also perform the simulation with four subjects facing different directions. The results are consistent with the previous results obtained from the simulations with different numbers of subjects. The results show that our proposed method increases the quality interaction degree and reduces the unacceptable degree of the subjects, as shown in Table 3.

Table 2 Results of learning social model with the number of people
Table 3 Results of learning social model with people facing different directions
Fig. 11
figure 11

Humanoid robot experiment overall process

Fig. 12
figure 12

Humanoid robot experiment: (left) the real experiments with pepper. (Right) the blue area visualizes the estimate private area. The green line is the quality interaction area boundary Bi. The red line is the private area boundary Bp

4.2 Humanoid Robot Experiment

We perform the experiment with a humanoid robot Pepper developed by SoftBank Robotics Corp. A variety of sensors of Pepper and its innate perception capabilities are suitable for human–robot social interaction. We navigate the robot through the environment while interacting with as many people as possible therein. We test the proposed navigation method in the open-source environment of Robot Operating System (ROS). Specifically, Pepper needs to have prior knowledge about its environmental geometric map which can be stored in the map server. With several sensors, Pepper can localize itself required for the navigation task. Pepper also can detect and receive the human state and social factors to generate the social map to assign the social cost to the geometric map. This social map imposes constraints on the robot path, enabling the robot to avoid or interact with people. The robot also receives a reward from people to update the parameters of MFs to re-compute and update the social map. The overall process is illustrated in Fig. 11.

The Pepper robot visits everyone and keeps the distance to make them feel comfortable around it. However, as many uncertainties exist, it is likely that Pepper initially makes a rough estimate of the size of the private area which may not suitable for him/her to comfortably interact with it. For instance, Fig. 12a shows that Pepper is outside the boundary of the quality interaction area \(B_{i}\). During the interaction with Pepper, people give reward by the verbal answer to the question from the robot. This reward allow Pepper to evaluate the social distance with them, i.e., the positive reward when Pepper is within the area where they feel comfortable to interact with it, or the negative reward for the distance from which they feel difficult to interact or discomfort (outside the quality interaction area boundary \(B_{i}\) or inside the private area boundary \(B_{p}\)). Learning people’s social interaction model helps Pepper to re-estimate the human private area until gaining a maximum positive reward. Finally, Pepper can locate itself within the area to interact with people that separates the private area as shown in Fig. 12b. In order to evaluate our proposed model, a total of five subjects participated in the experiment. Each person has a different range of quality interaction area, which is represented by the green line \(B_{i}\) and private areas, which is represented by the red line \(B_{p}\). The results are shown in Figs. 13, 14, 15, 16 and 17. It was confirmed that the social map may not clearly designate the private area at the initial phase of interaction, which is unsuitable for the subjects. In case of Figs. 131416, and 17, the robot is located away from the quality interaction area, therefore the robot receives hardly noticeable response from people, which is considered to be the negative reward, to update its parameters associated with the MF of the interaction degree. On the other hand, the robot receives the positive reward to update its parameters for the MF of the private area. In case of Fig. 15, the robot is initially located inside the private area. Therefore, the robot receives the negative reward to decrease the unacceptable degree and the positive reward to update the parameters associated with the interaction degree. Finally, our proposed social distance learning model enabled the robot to interact with the subjects at the proper distance between the boundaries of interaction and private areas as shown in Figs. 13, 14, 15, 16 and 17.

Fig. 13
figure 13

Experiment result with pepper robot: the interaction distance (blue line) converges to the area between the quality interaction area boundary \(B_i\) and the private area boundary \(B_p\) of Person 1. (Color figure online)

Fig. 14
figure 14

Experiment result with pepper robot: the interaction distance (blue line) converges to the area between the quality interaction area boundary \(B_i\) and the private area boundary \(B_p\) of Person 2. (Color figure online)

Fig. 15
figure 15

Experiment result with pepper robot: the interaction distance (blue line) converges to the area between the quality interaction area boundary \(B_i\) and the private area boundary \(B_p\) of Person 3. (Color figure online)

Fig. 16
figure 16

Experiment result with pepper robot: the interaction distance (blue line) converges to the area between the quality interaction area boundary \(B_i\) and the private area boundary \(B_p\) of Person 4. (Color figure online)

Fig. 17
figure 17

Experiment result with pepper robot: the interaction distance (blue line) converges to the area between the quality interaction area boundary \(B_i\) and the private area boundary \(B_p\) of Person 5. (Color figure online)

5 Conclusion

In this paper, a new proxemics learning strategy was proposed for social mobile robots toward realizing socially competent navigation behaviors by integrating a fuzzy inference system and a reinforcement learning method. The proposed method employed an individual’s state and social factor information to determine the size of the quality interaction area of each person in a shared environment. However, initial social maps may not correctly produce an accurate interaction distance to each person. This problem may cause the robot to intrude onto the human private area or remain away from the quality interaction area. The proposed method used the concept of learning from experiences to update the interaction distance with people reflecting their feedback. This concept improves the accuracy of social navigation map generation for the robot capable of avoiding the human private area while maintaining the path within the quality interaction area. The simulation and real robot experiments showed that our proposed method provides accurate social interaction cost maps through the reinforcement learning process which can increase the interaction degree and reduce the unacceptable degree at the same time.

There are some aspects of our proposed method that should be improved and expanded by future research. First, our proposed human’s area of privacy was designed by using a Gaussian model, then we tried to determine the good parameter for this model by using reinforcement learning as a kernel-based approximation scheme in human–robot interaction. Even though we have focused on an empirical study on developing new learning framework for socially competent robot exploration in human space, we will further consider a spectral learning scheme instead of this kernel based approach because kernel-based approximation scheme needs a big amount of training data (human–robot interaction in our problem) [30]. Second, we will investigate the effect of different parameters of the reinforcement learning algorithm, i.e., discounting factor, undiscounting factor or reward function and analysed in the analytical point of view. Third, the proposed method showed only the empirical results that it could be used to learn and model the human’s private area. The evaluation of the solution of each state on the problem will be considered and improved to verify the optimal solution for each state which could be improved the proposed private area model. Fourth, we will extend experiments under various dynamic environments populated with moving obstacles. Moreover, different social factors such as individual cultures and personality traits can be considered to design a more sophisticated social interaction map.