1 Introduction

A service robot with social interactive features is an autonomous robot that interacts and communicates with humans by following social behaviors and norms expected by their users (Bartneck and Forlizzi 2004). This kind of robots are developed for application domains such as health care (Edwards et al. 2018), education (Belpaeme et al. 2018), entertainment (Pérula-Martínez et al. 2017), and caretaking (Moyle et al. 2018). Most users in these application domains prefer human-like interaction abilities in human–robot interaction since it provides a seamless bond between robots and them (Tapus et al. 2007; Yuan and Li 2017).

Service robots used in domestic applications often need to navigate toward users when executing services requested by the users. Thereby, approaching behavior of a social robot toward users is a crucial factor that determines the quality of interactions between the robot and its users (Gómez et al. 2013; De Graaf and Allouch 2013). To have a friendly interaction with users during an approach, a robot should be capable of upholding proper proxemics at the termination position of an approach since the comfort of users depends on the proper use of space during interactions (Rossi et al. 2017; Ruijten and Cuijpers 2017).

1.1 Factors influencing human–robot proxemics

Psychologists, sociologists, and anthropologists have studied about proxemics behavior of humans and animals since early nineteen hundred (Bocardus 1925; Firestone 1977; Kaplan et al. 1983; Hall 1966). According to these studies, proxemics during an interaction with peers depend on their current behavior and the context of the interaction. However, the outcomes of these studies are limited to conceptual modeling of proxemics, and the results of these studies have solely been applied in psychological and sociological domains. On the other hand, many user studies have been conducted to identify proxemics preferences of humans during a robot’s approach toward users (Ruijten and Cuijpers 2017; Rossi et al. 2017; Karreman et al. 2014; Ball et al. 2014). These studies reveal preferences of human–robot approaching proxemics based on a variety of factors such as robot’s behavior, user personality, and context. However, the robots used in the above-cited human–robot interaction studies were not capable of autonomously determining the appropriate proxemics. Moreover, the robots were manually operated to collect data for the analysis.

1.2 Human–robot approaching proxemics models non-adaptable to user activity or behavior

A human-friendly approaching mechanism that is capable of navigating toward a couple of users during an ongoing conversation has been proposed in Samarakoon et al. (2018a). However, the termination distance between the robot and a user is fixed. The papers Satake et al. (2009) and Kanda et al. (2009) attempted to improve human–robot interaction by deploying approaching mechanisms that are capable of predicting the walking behavior of customers in shopping malls. However, the work attempted to improve human–robot interaction by an effective attraction of attention of the people, and less focus is paid on adapting approaching proxemics. Usage of this sort of non-adaptive termination distances for approaches is not effective for a domestic service robot which interacts with users in different contexts.

In Mead and Matarić (2016), authors have developed a proxemics behavior model for a robot that could improve the sensory experience of users. The method mainly focuses on enhancing the human–robot interaction by perceiving user instructions such as gestures and voice more accurately. The same authors extended this model to adapt approaching proxemics of a service robot to improve the communication of social signals such as gesture and speech in the work (Mead and Matarić 2017). However, the proxemics solely depends on the characteristic of communication modules, such as microphones, vision sensors, and speakers, and cannot be adapted based on physical user behavior or feedback. The work proposed in Henkel et al. (2014) introduced a proxemics scaling function for robots. Gao et al. (2018) compared the performance of different deep learning models in predicting the comfortable human–robot proxemics. Authors showed that the long short-term memory-based model can predict better comfortable human–robot proxemics. The examined models are capable of adapting the proxemics per the personal factors such as gender, age, and pet ownership of users. However, the methods Henkel et al. (2014) and Gao et al. (2018) do not consider human physical behavior to adapt proxemics. When users are engaged in different domestic activities such as reading and dancing their postural arrangements are continuously varying with time. Therefore, the use of such proxemics models that do not consider user behavior is controversial for determining appropriate proxemics with a user who would probably engage in diverse domestic activities from time to time.

1.3 Adapting human–robot approaching proxemics based on user activity or behavior

Much work has been conducted in the area of human activity and behavior recognition (Gaglio et al. 2015; Jalal et al. 2015; Wu et al. 2014; Attal et al. 2015). However, the scope of the cited studies is limited to activity and behavior recognition, and human–robot proxemics are out of the scope. In this regard, a method has been introduced for approaching robots toward a person considering the current activity of a user(Vitiello et al. 2017). The proposed method is capable of adapting the termination distance with a person of interest and the robot based on the current activity of the person. A wearable device is used to classify user activities. In here, a set of predefined posture categories (i.e., sitting, laying, walking, and standing) are considered as user activities. Moreover, the system is capable of determining a specific distancing for each of the defined posture categories. Nevertheless, assigning a fixed termination distance for a posture category is not effective since there can be diverse variations within the same posture. For example, two different scenarios of a standing person can be considered. In the first scenario, the person is in a standing posture while fully swing his/her arms in a fast manner. In the second scenario, the person is standing in the same posture while the arm extension is little and does not swing. The system proposed in Vitiello et al. (2017) would determine the same termination proxemics in the two occasions since the posture category is the same. In contrast, the proxemics in these two occasions must be different. Furthermore, assigning termination distances for a large number of posture categories to cover all of the domestic activities is a very challenging task. Therefore, a robot should be capable of determining the termination distance based on the physical parameters of a user instead of the posture category. In addition to that, the usage of wearable devices might not be convenient for users and retracts form typical day-to-day situations.

An approaching mechanism that can overcome the above-mentioned concerns has been proposed in Samarakoon et al. (2018b). The proposed approaching method is capable of determining the termination distance of a robot based on physical behavior of a user. The robot perceives skeletal parameters of a user retrieved from the RGB-D sensor attached to it for analyzing user behavior. Notably, the human–robot proxemics preferences vary from person to person depending on personal factors such as personality (Rossi et al. 2017) and cultural backgrounds (Khaliq et al. 2018; Shen et al. 2018). This sort of proxemics variation could be realized if the system was also capable of adapting for each user. However, the method proposed in Samarakoon et al. (2018b) lacks a way of adapting proxemics toward a user.

1.4 Adapting human–robot approaching proxemics based on feedback

On the other hand, the work Patompak et al. (2020) proposed a reinforcement learning approach that can adapt the boundary of the privacy area of a human where the robot should not interfere with this area during the navigation. This privacy space is adapted based on the feedback received while the robot navigating in an environment. However, the learnt privacy area for a human is fixed and the proxemics is not adapted per the physical user behavior such as the activity. Furthermore, the context of proxemic determination is not for approaching proxemics and the work mainly considers navigating a robot in human-populated environments while minimizing the discomfort due to interferences. Mitsunaga et al. (2008) proposed a method to learn proxemics based on subconscious body signals. The comfortable distances for Hall’s Hall (1966) proxemics zones (i.e., intimate, personal, and social zones) are learned by the system. The users were asked to move toward the robot for different interaction zones and those were utilized for learning. However, the proxemics distances for an interaction category (i.e., intimate, personal, and social) are fixed after the learning. Furthermore, the system is not developed to perceive user behavior through sensors and to use the sensory information in the proxemics model. In contrast, our work proposes a method to adapt proxemics based on physical user behavior perceived through skeletal information while learning the user preference to further adapt the proxemics. Similarly, the model proposed in Bhavnani and Rolf (2020) for a robot on a tabletop can learn the comfortable proxemics based on explicit vocal feedback. This model is also not capable of adapting the proxemics based on any other factor after the end of learning. Moreover, the proxemics distance for an individual user is fixed during the operation. According to the work Vitiello et al. (2017); Rossi et al. (2017), the proxemics should be adapted with different contexts such as interaction type, posture category, and activity for user comfort. Therefore, the models cited above (which can solely adapt proxemics based on feedback) are not convenient for a service robot that would like to interact with users in different contexts.

1.5 Limitations of the state of the art and the contributions

The state-of-the-art proxemics evaluation models are not capable of adapting proxemics based on both physical user behavior while learning the user preference through feedback. The existing methods that can adapt proxemics solely based on user behavior lack the ability to adapt the perception toward users (see Sect. 1.3). Adapting the proxemics toward a user is essential for improving satisfaction since human proxemics preferences depend on personal factors (Rossi et al. 2017; Khaliq et al. 2018; Shen et al. 2018). On the other hand, the methods that can solely adapt proxemics based on feedback are only capable of adapting toward user preference and lack the ability to adapt the proxemics based on physical user behavior (see Sect. 1.4). The ability to adapt the proxemics based on user behavior is crucial for improving user comfort (Vitiello et al. 2017; Rossi et al. 2017). Moreover, adapting proxemics based on user behavior and feedback is equally essential for a service robot to improve user satisfaction. Therefore, a method that can adapt proxemics based on physical user behavior while learning user expectations would be a perfect solution for improving user satisfaction of human–robot proxemics.

This paper proposes a novel method that is capable of adapting the approaching proxemics based on current physical user behavior and the user preference expressed through feedback. The major improvement of the proposed method over the existing systems is that the proposed system is capable of determining approaching proxemics in accordance with current dynamic physical user behavior while learning the preference of a user through feedback. An outline of the proposed mechanism is given in Sect. 2. Section 3 explains the proposed approaching mechanism. Particulars on the experimental validation are given in Sect. 4. Section 5 provides concluding remarks.

2 Functional overview

An outline of the proposed mechanism is depicted in Fig. 1. The system considers both physical user behavior and previous experience of interactions with a user to determine most appropriate termination position, \(P_\mathrm{T}\). Moreover, the robotic system can decide \(d_\mathrm{T}\) (i.e., interpersonal distance at \(P_\mathrm{T}\)) and \(\phi \) (i.e., direction of \(P_\mathrm{T}\) with respect to the user) based on joint movements of the user and experience acquired from user feedback given through vocal cues. A user is perceived by the robot as skeletal information retrieved from Kinect sensor. The Skeletal Information Extraction Unit (SIEU) evaluates 3D coordinates of the skeletal joints of the user as feature points (With the support of Kinect SDK) based on knowledge of the Skeletal Information database. The SIEU observes a user for a \(T_{\mathrm{ob}}\) time period. \(T_{\mathrm{ob}}\) was experimentally set to 10 s by observing the behavior of SIEU. After the completion of the analysis, key information is fed into the Fuzzy Proxemic Evaluation Model (FPEM). The FPEM is implemented with a fuzzy neural network. It considers parameters related to movements and positioning of body joints of a user as inputs to determine the output. The output of this module is the interpersonal distance at the termination position (i.e., \(d_\mathrm{T}\)) and the direction of termination position with respect to the user (i.e., \(\phi \)). After determining \(d_\mathrm{T}\) and \(\phi \), the Action Manager (AM) coordinates with the Navigation Controller (NC) to move the robot toward a required termination position (i.e., \(P_\mathrm{T}\)).

Fig. 1
figure 1

System overview

The NC oversees the primitive navigation functionalities of the robot such as localization and collision free path planning within a navigation map stored in the robot’s memory. After completion of an approach toward a user, the user may give voice feedback about the interpersonal distance at the termination position (i.e, \(d_\mathrm{T}\)) determined by the robot. The feedback given as a vocal cue is parsed by the Voice Recognition and Understanding module with the aid of the Language Memory. If valid feedback is given, then AM redirects it to the Proxemic Modifier (PM). The PM modifies the parameters of the FPEM accordingly. This facilitates the learning of the fuzzy neural network of the FPEM based on user feedback.

3 Approaching mechanism

3.1 Evaluation of physical user behavior

Physical user behavior is perceived by the robot similar to the approach used in Samarakoon et al. (2018b). The physical user behavior is perceived by considering the maximum displacement of the body joints from the center of a user and the maximum speed of the body joints can be considered the representative parameters The motivation behind using maximum deflection of body joints from the central plane and the maximum joint velocity as representative parameters of physical user behavior can be explained with the aid of example scenarios depicted in Fig. 2. In the case of the user extending his/her body joints in a wider manner (as in Fig. 2a), the robot should maintain a higher termination distance with the user to avoid the invasion of space compared to a situation where the extension of the body joints is lesser (as in Fig. 2b). Furthermore, when the joints are moving fast, humans feel more comfortable when the surrounding is free. In other words, the termination proxemics should depend on the moving speed of the body joints.

Fig. 2
figure 2

The motivation behind the selection of representative parameters to adapt the termination distance according to physical user behavior. a Large termination distance is required to avoid the invasion of the space, b small termination distance is sufficient to avoid the invasion of the space

Fig. 3
figure 3

The joints used to perceive user behavior by the robot are annotated here with red dots. \(D_j\) is explained with the aid of elbow joint. \(D_{\mathrm{elbow}}\) is the distance between elbow joint and the vertical plane. It should be noted that both left and right joints are considered by the system

The coordinates of the body joints perceived from the skeletal information retrieved through Kinect sensor are used to perceive physical behavior of a user. The SIEU analyzes trajectories of the set of skeletal joints annotated in Fig. 3. The distance to jth joint at time t from the vertical plane that passes through the spine base joint is defined as \(D_j(t)\) where \(j\in \){head, \(\textit{spine}\_\textit{base}\), \(\textit{shoulder}\_\textit{right}\), \(\textit{shoulder}\_\textit{left}\), \(\textit{elbow}\_\textit{right}\), \(\textit{elbow}\_\textit{left}\), \(\textit{wrist}\_\textit{right}\), \(\textit{wrist}\_\textit{left}\), \(\textit{knee}\_\textit{right}\), \(\textit{knee}\_\textit{left}\), \(\textit{foot}\_\textit{right}\), \(\textit{foot}\_\textit{left}\)}.

The movement speed of jth joint at time t, \(\dot{\theta }_{j}(t)\) is obtained as in (1) where \(\Delta t\) is the time step of the SIEU. The maximum joint speed \(\dot{\theta }\) is obtained from (2). Sufficient time duration for reliable and stable perceiving of physical user behavior was considered as the criterion for experimentally configuring duration \(T_{\mathrm{ob}}\). The time step, \(\Delta _\mathrm{T}\) was determined experimentally, considering a sufficient time step for observing the velocity of body joints in typical activities.

$$\begin{aligned} \dot{\theta }_{j}(t)= & {} \frac{{D_{j}(t)}-{D_{j}(t-\Delta t)}}{\Delta t} \end{aligned}$$
(1)
$$\begin{aligned} \dot{\theta }= & {} \text {max}\{\dot{\theta _{j}}(t) ~|~ \forall ~j;\forall ~t=0:\Delta t:T_{\mathrm{ob}}\} \end{aligned}$$
(2)

The distance to the farthest joint from this vertical plane that passes through the \(\textit{spine}\_\textit{base}\) joint is obtained by (3).

$$\begin{aligned} D = \text {max}\{D_{j}(t) ~|~ \forall ~j;\forall ~t=0:\Delta t:T_{\mathrm{ob}}\} \end{aligned}$$
(3)

The parameters D and \(\dot{\theta }\) are fed to the FPEM as inputs that adapt the interpersonal distance at the termination position of an approach toward a user.

3.2 Determination of termination position

The process of determining the proxemics based on physical user behavior and user feedback cannot be mathematically modeled. On the other hand, fuzzy logic has the ability to model a process without the knowledge of exact underlying dynamics (Ma et al. 2020; Samarakoon et al. 2021; Ibarra and Webb 2016). The required behavior of the proxemics evaluation criteria could be expressed through linguistic rules based on expert knowledge. Fuzzy logic allows the modeling of any complex process through the use of linguistic rules (Nguyen et al. 2018; Zadeh 2008). Furthermore, the sensor information perceived for evaluating physical user behavior is imprecise and fuzzy logic has the ability to cope with imprecise sensory information during decision making (Phan et al. 2020; Ibarra and Webb 2016). In addition, fuzzy logic is often used in human-centric fields due to its high power of cointensive precisiation (Zadeh 2008). Thus, exploitation of these abilities of fuzzy logic could often be seen in the literature on the development of proxemics models (Vitiello et al. 2017; Kosiński et al. 2016). Nevertheless, fuzzy logic does not possess the learning ability. A fuzzy neural network is a hybrid technique with the features of fuzzy logic discussed earlier and the learning ability. This learning ability could facilitate the online learning of the robot through user feedback without requiring an explicit data set. This sort of online learning ability is crucial for a robot that learns while performing the services. A fuzzy neural network creates an explainable model, and the parameters could easily be pruned to ensure essential requirements such as safety. For example, the universe of discourse of the inputs and outputs could be manually configured to avoid safety concerns such as dangers closing the robot to users since the robot is operated in human-populated environments. Therefore, a fuzzy neural network has been proposed to evaluate the proxemics based on physical user behavior while learning the user preference through feedback.

The Fuzzy Proxemic Evaluation Model (FPEM) is used to determine the interpersonal distance (i.e, \(d_\mathrm{T}\)) and direction (i.e., \(\phi \)) at the termination position of an approach toward a user. The direction of an approach is determined as similar to that of the method proposed in Samarakoon et al. (2018b). The fuzzy neural network evaluates user behavior through joint displacements and movement speeds to determine a comfortable termination distance. At the same time, it can learn the preference of users by means of user feedback. The architecture of the proposed fuzzy neural network is depicted in Fig. 4.

Fig. 4
figure 4

Architecture of the fuzzy neural network used in the FPEM. It has five layers. The distance to the farthest joint from the vertical plane that passes through the spine base (i.e., D) and maximum joint speed (i.e., \(\dot{\theta }\)) are the inputs of the network. The output is the interpersonal distance at the termination position of an approach, \(d_\mathrm{T}\)

The input layer is labeled as layer I. The inputs, the distance to the farthest joint (i.e., D) and the maximum joint speed (i.e., \(\dot{\theta }\)) are acquired by the two nodes in this layer. These inputs are used to perceive physical behavior of a user who is supposed to be approached by the robot as similar to the proxemics determination method proposed in Samarakoon et al. (2018b). The acquired inputs are transferred to the fuzzification layer labeled as layer II. Fuzzy sets used to fuzzify the inputs are represented by the nodes in here. Moreover, these nodes denote the antecedents of the fuzzy rules represented in the fuzzy rule layer (labeled as layer III). Each fuzzy rule of the inferencing system is represented by a single neuron. The algebraic product T-norm fuzzy operator is used to evaluate the output based on the incoming signals from antecedents of the corresponding fuzzy rule. Layer IV represents the output fuzzy sets used in consequents of the fuzzy rules. A neuron in this layer combines its inputs considering the fuzzy union operator as T-conorm. Triangular membership functions with center \(a_{i}\epsilon [(a_{i})_{L},(a_{i})_{U}]\) and width \(b_{i}\epsilon [(b_{i})_{L},(b_{i})_{U}]\) are represented by any node \(C_{i}^{d}\) in this layer.

Layer V is the defuzzification layer. The sum-product composition method (Jang et al. 1997) is used to obtain the defuzzified output. Therefore, the output, \(d_{T}\) (i.e., the interpersonal distance at the termination position) can be obtained from (4), where \(\mu _{i}\) is the firing strength of the ith output fuzzy set.

$$\begin{aligned} d_{T} = \frac{\sum _{i=1}^{5}a_{i}b_{i}\mu _{i}}{\sum _{i=1}^{5}b_{i}\mu _{i}} \end{aligned}$$
(4)

Initial membership functions of the fuzzy neural network have been defined similarly to the membership functions of the fuzzy inference system proposed in Samarakoon et al. (2018b) for determining the interpersonal distance at the termination position based on physical user behavior. The initial membership functions of the inputs and the output are shown in Fig. 5.

Fig. 5
figure 5

a shows the input membership functions for the maximum joint displacement (i.e., D). It has three fuzzy sets labeled as \(C_{1}^{D}\), \(C_{2}^{D}\) and \(C_{3}^{D}\). b shows the input membership functions for the maximum joint speed (i.e., \(\dot{\theta }\)). It has three fuzzy sets labeled as \(C_{1}^{\dot{\theta }}\), \(C_{2}^{\dot{\theta }}\) and \(C_{3}^{\dot{\theta }}\). c shows the initial membership functions for the output, \(d_\mathrm{T}\). The ranges of initial membership functions are defined similar to that of the system proposed in Samarakoon et al. (2018b) based on heuristic knowledge

Initial connection weights of layer V are decided by the corresponding initial fuzzy sets of the output membership function. The connection weights are adapted to learn user preferences in relation to interpersonal distances at the termination positions of approaches. The backpropagation algorithm is used for adapting the connection weights in this regard. The error between the interpersonal distance determined by the robot and the preference of the user at a particulate instance (i.e., defined as e) is identifer based on user feedback, which will be given just after robot approaches toward the user. A user can give the vocal feedback, “too close”, if the robot moved closer to the user than his/her preference or “too far”, if the stopping distance is far away from the preference to adapt the robot proxemics determination. Then, the modifications of the parameters of the output fuzzy sets for a particular instance n are given by (5) and (6), where \(n+1\) is the next instance. Here \(\eta \) is the learning rate, and scalar constants \(\delta _{a}\) and \(\delta _{b}\) are used to maintain the variations of the parameters within the desirable ranges during the learning. A higher learning rate makes the learning model unstable since the adaptation is too aggressive. If the learning rate was too small, the system would take a long time for the adaptation (higher number of feedback). The learning rate (i.e., \(\eta \)) was chosen trial and error by observing the variation of the learning parameters of the fuzzy neural network (i.e., \(a_i\) and \(b_i\) ). For example, if the learning parameters of the fuzzy neural network reach to its lower or upper bound merely from a few feedback, the learning rate was reduced. Reasonable values for \(\delta _a\) and \(\delta _b\) were determined similar to the determination of \(\eta \). Here, the center of a fuzzy set (i.e., \(a_i\)) can make a higher impact in an adaptation step than that of the width of the corresponding fuzzy set (\(b_i\)). Therefore, a slightly higher value is preferred for \(\delta _a\) than \(\delta _b\). The selection of these parameters was done before the experimental evaluation (during the development and pre-testing).

$$\begin{aligned} a_{i}(n+1)= & {} {\left\{ \begin{array}{ll} a_{i}(n)+\eta \delta _{a}e\mu _{i} &{} \text { if } a_{i}(n+1) \epsilon [(a_{i})_{L},(a_{i})_{U}]\\ a_{i}(n)&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(5)
$$\begin{aligned} b_{i}(n+1)= & {} {\left\{ \begin{array}{ll} b_{i}(n)+\eta \delta _{b}e\mu _{i} &{} \text { if } b_{i}(n+1)\epsilon [(b_{i})_{L},(b_{i})_{U}]\\ b_{i}(n)&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(6)

The lower and upper bounds of the centers (\((a_{i})_{L},(a_{i})_{U}\)) and widths (\((b_{i})_{L},(b_{i})_{U}\)) of the output membership functions are defined as in (7), (8), (9) and (10) respectively to preserve the meaning of the inference rules of the system proposed in Samarakoon et al. (2018b).

$$\begin{aligned} (a_{i})_{L}= & {} {\left\{ \begin{array}{ll} 0 &{} \text { if } i = 1 \\ a_{i-1}(0) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(7)
$$\begin{aligned} (a_{i})_{U}= & {} {\left\{ \begin{array}{ll} a_{i+1}(0) &{} \text { if } i = 1,2,3,4 \\ 180 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(8)
$$\begin{aligned} (b_{i})_{L}= & {} \frac{b_{i}(0)}{2} \end{aligned}$$
(9)
$$\begin{aligned} (b_{i})_{U}= & {} \frac{3b_{i}(0)}{2} \end{aligned}$$
(10)

The error between the interpersonal distance determined by the robot and preferred by a user (i.e., e) is evaluated based on user feedback given just after robot approaches toward the user. However, user feedback is given as a vocal cue. Moreover, the user feedback is often given qualitatively and not in numerically. The error given as a linguistic term is converted to a numerical value by the Proxemic Modifier (PM). It is assumed that the numerical meaning of a feedback linguistic term depends on the current observation of the user. Therefore, the interpersonal distance between the robot and the user when feedback is given (i.e., \(d_\mathrm{T}(n)\)) is used to determine the error indicated from user feedback. Two feedback vocal cues are considered for indicating the intention of the user to correct the robot. “too close” is used to indicate a positive error (user expect a larger interpersonal distance than this) and “too far” is used to indicate a negative error (user expect a smaller interpersonal distance than this) for \(d_\mathrm{T}(n)\) determined by the robot. The assumed distance errors for the valid user feedback are given in (11), where \(\delta _e\) is an experimentally decided scalar constant such that \(\delta _e\in [0,1]\). If no feedback is explicitly given it is considered as “ok”.

$$\begin{aligned} e = {\left\{ \begin{array}{ll} +\delta _ed_T(n) &{} \text {if user feedback = ``too close''} \\ -\delta _ed_T(n) &{} \text {if user feedback = ``too far''} \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(11)

4 Experimental validation

4.1 Experiment design

Experiments have been conducted using MIRob (Muthugala and Jayasekara 2016) by a way of a user study to evaluate the value addition of the adaptation of a robot’s approaching proxemics toward users. Moreover, the conducted experiments compare and contrast the performance of the learning mechanism over a similar system with no learning ability [i.e., the system proposed in Samarakoon et al. (2018b)]. The parameters related to the perceiving of user behavior and learning have been heuristically chosen as \(T_{\mathrm{ob}} = 10~\text {s}\), \(\Delta t = 1~s\), \(\delta _e = 0.33\), \(\eta = 0.1\), \(\delta _{a} = 5\), and \(\delta _{b} = 3\).

User satisfaction, a parameter that has been used in Muthugala and Jayasekara (2017) to evaluate the performance of robots based on a series of user feedback for the actions of the robot, is used to evaluate the performance of the proposed system. User satisfaction (\(\mathrm{US}_{N_S}\)) can be computed as in (12), where \(N_{\mathrm{OK}}\) is the number of feedback received as “OK” within \(N_S\) number of previous steps considered.

$$\begin{aligned} \mathrm{US}_{N_S} = \frac{N_{\mathrm{OK}}}{N_{S}} \end{aligned}$$
(12)
Fig. 6
figure 6

MIRob during experiments is shown here

MIRob during a few experimental scenarios is shown in Fig. 6. The suggestions given in Bethel and Murphy (2010) for conducting user trials to assess human–robot interaction have been followed to curtail the subjectivity of the outcomes. Specifically, the following suggestions were considered; counterbalancing to avoid the ordering effects, determination of a good sample size through power analysis, Selection of subjects for the user study and way of reporting their details, and identification of required statistical tests for generalization and avoiding the subjectivity of the outcomes. 18 subjects, whose ages were between 24 and 52 (M = 30.4, SD = 7.3), participated in the experiment. The participants were either students or staff of the university. All the participants were in healthy conditions and had South Asian cultural backgrounds. Four frequently used domestic activity types were selected for the experiment and the selected activity types were reading a book, making a phone call, exercising, and working on a laptop. The selected four activities have heterogeneous characteristics such as different postures and speeds, and the activities represent frequent contexts where a service robot could often be utilized for service tasks such as delivering something. In addition to that, the selected activities can be performed in many ways. For example, exercise can be done arm or limb, standing, and sitting. Thus, the chosen four activities cover a vast range of characteristics found in typical indoor activities. For each activity type, the subjects were requested to perform the activity at their own way. An environment representing a domestic setting was used for the environment. The following specific instructions were given for each activity type.

  • Working on a laptop: A combination of a table and a chair placed in the environment was used here. The laptop was placed on the table, and a participant was asked to use the laptop. As the work to be done on the laptop, the participant was asked to type a few sentences on word-processing software about a movie or a novel that the participant had recently read or watched. Assigning this sort of work makes the subject’s concentration on the work and would help to maintain naturality.

  • Reading a book: A participant was given a short storybook and asked to read a short story while sitting on a chair placed in the environment. The short stories in the books were brief enough to read within a short time where the participant could read a complete story. This consideration would help in maintaining the concentration of a participant on the activity.

  • Exercise: A participant was asked to go to a free area within the environment and perform a short exercise. The participant was given the freedom to perform any exercise in his/her own way. However, the participant was asked to bound within a marked area while performing the exercise.

  • Making a phone: The participant was given a mobile phone and asked to make a phone call to a given number. From the other end of the phone call, the participant was questioned about hobbies to concentrate on the phone call.

The participants were informed that during the task, the robot approached them to request a service task. The participants were requested to give their feedback on the termination distance determined by the robot for the approach. Furthermore, the participants were instructed to give their feedback verbally as “too far”, “ok”, and “too close”.

The subjects were equally divided into two groups and the experiment was conducted in four phases. In the first phase, the system proposed in this paper was implemented on the robot (i.e., the system with learning ability). Each subject of the first group was invited individually for the first phase of the experiment and asked to perform a randomly assigned activity per instance. A random number generator was used to select the activity type randomly. The subject was given the freedom to do the activity in his/her own way. Then, the robot was triggered to approach toward the subject based on the termination distance (i.e., \(d_\mathrm{T}\)) determined by the FPEM. The subject was asked to give feedback after each approaching instance. Then the activity was switched to a new randomly selected activity type. Likewise, the process was repeated 20 instances for a single subject by switching each activity type randomly. In the second phase, the subjects of the second group were invited to the experiment. The ability of the robot in learning the user preference (i.e., the ability to modify the internal parameters of the FPEM based on user feedback) was disabled in the second phase. Here, the robot is fixed to the initial parameters of fuzzy membership functions (i.e., \(a_i\) and \(b_i\) at \(n = 0\)) since the robot cannot adjust the parameters based on feedback. This fixed fuzzy network is similar to the fuzzy logic model proposed in Samarakoon et al. (2018b) that can adapt proxemics solely with physical user behavior. Thus, the abilities of the robot with learning ability and the robot with no learning ability were the same except for the availability of the feedback-based user preference learning ability. In the third phase, the subjects of the first group were invited for the experiment. The robot with no learning ability was used in the third phase. In the fourth phase, the subjects of the second group were invited for the experiment while the learning ability was enabled in the robot. The order of conducting the experiment by subdividing into four phases with the two groups of subjects is depicted in the schema given in Fig. 7. This kind of strategy was chosen to minimize the subjectivity aroused due to the familiarity with the robot since human–robot proxemic preferences may depend on the familiarity with the robot. Moreover, this counterbalancing controls order effects in the within subject experiment conducted for the validation. Furthermore, it was ensured that the subjects were not aware of the system with which they were interacting (whether with learning ability or with no learning ability) during the experiment.

Fig. 7
figure 7

The order of conducting the experiment subdividing into four phases is explained here

In addition to the evaluation through user satisfaction during the experimental runs, a questionnaire was given to each subject after the completion of a phase of the experiment. The questionnaire was composed with four question statements where the participants can give ratings in 5-point Likert scales. The set of question statements in the questionnaire can be found in Table 2.. This questionnaire was mainly directed toward assessing overall performance and behavior of the proposed method in determining approaching proxemics of a robot. This sort of a secondary evaluation criterion was used to increase the reliability of the validation.

Table 1 Sample results of the system that adapts toward users
Table 2 Questionnaire and the ratings received

4.2 Results

The parameters related to the FPEM for approaching instances of a randomly selected subject when interacting with the system proposed in this paper are given in Table 1. In here, the initial parameters of the system are given in the raw, \(n = 0\). In the first instance, the parameters related to user physical behavior; D and \(\dot{\theta }\) were 38 cm and 21 cm s\(^{\text {-1}}\) respectively. Based on the parameters of the FPEM and the input parameters related to physical user behavior, the output FEFM (i.e., \(d_\mathrm{T}\)) was 55 cm. Then, the robot approached to the user keeping a distance of 55 cm. However, the termination distance decided by the robot was more than the proxemics preferred by the user, and the user gave the feedback “too far” suggesting a correction. When the robot receives such corrective feedback, the AM redirects it to the Proxemic Modifier (PM). Then, the PM determined the quantitative error value for this instance (i.e., e) as − 18 cm. Subsequently, the parameters of the FPEM were modified (\(a_i\) and \(b_i\)) as shown in the rest of the columns.

In the next instance (i.e., \(n = 2\)), the user was asked to do a different activity and the robot was commanded to approach to the user. The user behavior perceived by robots as D and \(\dot{\theta }\) were 57 cm and 65 cm s\(^{\text {-1}}\) respectively. \(d_\mathrm{T}\) determined by the FPEM was 95 cm (based on current \(a_i\) and \(b_i\), and inputs D and \(\dot{\theta }\)). Therefore, the robot approached the user by maintaining termination distance of 95 cm. The user was satisfied with the termination proxemic decided by the robot. Hence, a corrective feedback is not given, and the internal parameters of the FPEM (\(a_i\) and \(b_i\)) were not adapted in the 2\(^{\text {nd}}\) instance (i.e., \(n = 2\)). Similarly, 20 approaching instances were conducted for this subject. The User Satisfaction (US) was computed by considering previous 10 instances (i.e., \(N_S = 10\)). Therefore, \(\mathrm{US}_{10}\) is computed 10th instance onward. The variation of the internal parameters of the FPEM verifies that the system proposed in this paper can adapt the proxemics based on user preferences conveyed through feedback. In contrast, the system with no learning ability is not capable of modifying the internal parameters of the system based on user feedback. Moreover, proxemics determined by the existing methods cannot be altered from corrective feedback of a user.

Fig. 8
figure 8

The variation of mean user satisfaction with number of approaching instances is shown here for both systems. Error bars are drawn to represent the standard error of the mean

Similarly, the experiment was conducted for all the subjects with both systems (i.e., with learning ability and with no learning ability). The mean values of \(\mathrm{US}_{10}\) for all the subjects were calculated for each instance (\(n = 10\) upward) for both systems. The variation of mean values of \(\mathrm{US}_{10}\) with the number of instances for both systems is plotted in Fig. 8 along with error bars. In the 10th instance, \(\mathrm{US}_{10}\) of the system proposed in this work is 0.394 and \(\mathrm{US}_{10}\) of the system with disabled learning functionality is 0.400. In this stage, the mean user satisfaction is low for both systems. Furthermore, the difference is not statistically significant (\(t_{(17)} = -0.160\), \(p = 0.877\)). This phenomenon could be observed until 14th instance; when \(n = 11\): \(t_{(17)} = 0.49\), \(p = 0.631\); when \(n = 12\): \(t_{(17)} = 1.49\), \(p = 0.146\); when \(n = 13\): \(t_{(17)} = 1.53\), \(p = 0.136\). This verifies that there was no initial bias in user satisfaction for either system. It can be observed that the mean of \(\mathrm{US}_{10}\) gradually increased with the number of instances in the system proposed in this paper. In the 20th instance, \(\mathrm{US}_{10}\) was 0.71. In contrast, the mean of \(\mathrm{US}_{10}\) of the system with no leaning ability has not increased compared to the system proposed in this paper. The difference between the mean user satisfaction of the two systems is statistically significant from 14th instance onward; when \(n = 14\): \(t_{(17)} = 2.55\), \(p = 0.016\); when \(n = 15\): \(t_{(17)} = 3.86\), \(p = 0.001\); when \(n = 16\): \(t_{(17)} = 3.27\), \(p = 0.003\); when \(n = 17\): \(t_{(17)} = 5.03\), \(p = 0.000\); when \(n = 18\): \(t_{(17)} = 5.57\), \(p = 0.000\); when \(n = 19\): \(t_{(17)} = 5.07\), \(p = 0.000\); when \(n = 20\): \(t_{(17)} = 11.85\), \(p = 0.000\). Furthermore, the difference is large according to effect size calculated based on Cohen’s d (Cohen’s \(d = 0.85\), 1.29, 1.09, 1.68, 1.86, 1.69, and 3.95 for \(n = 14\)–20 respectively). Therefore, these statistical outcomes confirm that the system proposed in this paper notably improves the user satisfaction of approaching proxemics determined by a robot compared to the system with no learning ability. Moreover, the capability to adapt the termination distance of approaches based on previous experience with a user increases a robot’s ability in determining proper proxemics that improves user satisfaction.

The ratings received for the questionnaire for both systems are given in Table 2 along with the set of question statements. It was found out that the internal consistency of the questionnaire was acceptable for the evaluation since Cronbach’s alpha was greater than 0.7 (for the system with the proposed learning ability = 0.847 and for the system with no learning ability = 0.704). Here the response for the questionnaire is taken on 5-point Likert scales. Non-parametric statistical tests are preferred for analyzing the results received as Likert data (Kaptein et al. 2010). Therefore, the Wilcoxon test, a non-parametric statistical test, has been used in this regard. In non-parametric statistical analysis, group medians are preferred instead of the group means for the comparison. Therefore, group medians are considered for the comparison. Even though the question statements are directed to measure a single objective (with a reliable internal consistency), each question statement provides different insights about the system helpful in evaluating behavior and performance. Therefore, separate explanations for each question would be useful for providing more insights on the performance and behavior of the proposed system.

The first question statement of the questionnaire (i.e., Q1) examines whether the proxemics determined by the robot is varied with the activity type being performed by a user. Median ratings for the system with learning ability (5.0) and the system with no learning ability (4.5) are not statistically significantly different, p = 0.580 (based on Wilcoxon test, W = 351). According to the received ratings, the subjects agreed that the robot proxemics varies with the activity type for both systems. Therefore, this validates that both systems are capable of altering a robot’s approaching proxemics based on the activity type performed by a user.

The second question statement (i.e., Q2) enquires about the variation of proxemics determined by the robot within an activity type based on the way of the activity being performed by a user. For example, it queries whether the systems can determine different proxemics for an exercise done for a different amount of stretch or moving speeds or body joints. According to the received ratings, the subjects agreed that both systems can adapt proxemics in accordance to behavior of activity in regardless of the activity type (median ratings for the system with learning ability = 5.0) and the system with no learning ability = 4.0). Furthermore, the ratings do not reflect favoritism for any system in this particular behavior (the difference in median ratings is not statistically significant according to Wilcoxon statistic, W = 379, p = 0.146). This validates that users can observe that both systems are capable of altering the proxemics based on behavior of an activity instead of depending merely on the activity type.

The third question statement (i.e., Q3) enquires whether the subjects could observe any effect in proxemics determined by the robot in accordance with user feedback given by them during prior interactions. Moreover, this examines whether the interpersonal distance decided by the systems are noticeably adapted from user feedback. According to the results, the subjects agreed that the system proposed in this paper (i.e., the system with learning ability based on user feedback) can noticeably adapt proxemics based on user feedback given in prior interactions. In contrast, the subjects disagreed with the same claim for the system with no learning ability. Median ratings for the system with learning ability (5.0) and the system with no learning ability (2.0) are significantly different (W = 495, p =0.00). Therefore, this validates that the proposed system is capable of adapting the proxemics perception of a service robot through user feedback with respect to a system with no such learning ability. Moreover, user feedback has a noticeable effect on adapting the interpersonal distance determined by the proposed system in this paper.

The fourth question statement (i.e., Q4) examines whether the subjects could sense improvement of the proxemics determined by the robot with time. Moreover, this evaluates whether the proxemics determination of the robot was noticeably improved with the experience of the robot. According to the results, the subjects noticed an improvement of the robot in deciding the proxemics with time for the system with adapting ability toward users (i.e., the system proposed in this paper). Nevertheless, the subjects could not notice an improvement in proxemics determination with time for the system with no adaptation ability toward users (i.e., with no learning ability). Median ratings for the system with learning ability (5.0) and the system with no learning ability (2.0) are significantly different (W = 495, p =0.00). This implies that the system proposed in this paper is capable of improving the determination of approaching proxemics by a service robot with experience with respect to a system with no such learning ability.

The following salient features of the proposed system can be discussed based on the overall behavior and the performances identified during the experimental evaluation.

In the work proposed in Vitiello et al. (2017), termination distance of a robot is adjusted for a posture category defined for different activity type such as standing, sitting and lying. In this work, a Neuro-Fuzzy-Bayesian network is trained to assign different proxemics for each of the defined set of activity types classified by analyzing a user. Therefore, the work proposed in Vitiello et al. (2017), is capable of adapting the proxemics based on the activity type. Nevertheless, the cited work is not capable of adapting the proxemics within the same activity in accordance with dynamic aspects of the particular activity such as speeds and amount of joint movements. The method proposed in this paper is cable of adapting approaching proxemics of a service robot based on physical behavior of a user such as movement speeds and joint positionings instead of merely adapting proxemics for a set of defined activity types as done in Vitiello et al. (2017). Furthermore, the system proposed in this paper can handle many of the typical activities without limiting for a set of defined activity type since the system proposed in this paper considers current physical behavior of a user without classifying it for different activities. Therefore, the method proposed in this paper would complement a system that can only adapt proxemics merely based on activity type instead of behavior of activity.

The method proposed in Samarakoon et al. (2018b) is capable of adapting approaching proxemics in accordance with physical behavior of a user perceived by analyzing joints locations and movement speeds. Therefore, the cited work can adaptively determine approaching proxemics for many of typical activity types without limiting for a set of activities. According to Shen et al. (2018); Rossi et al. (2017), proxemics preferences of humans depend on person to person based on factors. However, the work proposed in Samarakoon et al. (2018b) is not capable of adapting approaching proxemics for different users based on their preferences. Therefore, the system proposed in Samarakoon et al. (2018b) lacks in the ability to adapt toward a user and cannot match approaching proxemics to preferences of the user even though the system has received feedback from user. This is the major drawback that degrades user satisfaction of a system which cannot adapt proxemics toward a user. The system proposed in this paper is capable of adapting proxemics based on physical user behavior while learning user preference. The learning of user preference through the experience of prior interaction facilitates the adaptation of proxemics toward a user. Therefore, the method proposed in this paper can improve user satisfaction of approaching proxemics determined by a service robot with compared to a system that is not capable of learning user preference (experimental results confirm this).

Furthermore, according to Syrdal et al. (2008) and Walters (2008), human–robot proxemics preferences depend on physical attributes of a service robot such as type and height. User preference can be varied from person to person. This problem can be deduced to the problem of having different proxemics preference of users discussed earlier. Therefore, if a robot can adapt its approaching proxemics to meet user preference, such a system can cope with the problem of human–robot proxemics preferences reliant on attributes of a service robot. Moreover, a service robot would adapt approaching proxemics to improve user satisfaction by learning user preference of proxemics for its attributes. This yields to diminish the requirement of manually tuning of proxemics for different robots that have different physical attributes since learning of user preference facilitates the self-adaptation. Therefore, the system proposed in this paper is capable of coping with the problem of variation of human–robot proxemics due to physical attributes of a service to a certain level. However, it should be noted that an experimental validation in this regard has not been conducted using different service robots within the scope of the work presented in this paper, and this has been deduced from facts.

4.3 Discussion

The user study for evaluating the system has been carried out with 18 participants, and the sample size has been decided based on the prior literature. Similar studies have been carried out in prior work with similar sample sizes (e.g., Mitsunaga et al. 2008: 15 participants, Walters et al. 2011: 7 participants, Mead et al. 2013: 18 participants, Bhavnani and Rolf 2020: 15 participants, and Patompak et al. 2020: 5 participants). In addition to that, we have conducted a power analysis to check the validity of the sample size in our specific case. The power values greater than 0.8 could be observed (According to Cohen’s four-to-one weighting of beta-to-alpha risk criterion, power value greater than or equal to 0.8 can be considered as good (Ellis 2010)). Therefore, the number of participants is sufficient for demonstrating generalizability and reliability.

There is a possibility of causing an order effect by the given questionnaire since the questionnaire was given in between the phases of the experiments, and the participants could focus on the aspects in the questionnaire during their second phase. Therefore, the variation of user satisfaction of group 1 and group 2 during the system with learning ability and the system with no learning ability was individually compared to check whether there is an ordering effect. However, a significant difference between the two groups could not be found, suggesting any significant ordering effect. Furthermore, the participants were divided into two groups to counterbalance the ordering effect. Therefore, the questionnaire has not made biased for the overall subjective evaluation of the participants. The experimental design providing a sufficient time gap between the interaction that could fade the focus of a participant in the experiment and questionnaire might have helped overcome this possible bias.

The proposed system adapts its proxemics based on physical user behavior and user feedback. The approaching proxemics determined by the system is not fixed for a particular activity category since the system considers dynamic parameters of body joints such as locations and speed instated of the activity category. Therefore, the proxemics for a particular activity like reading a book depends on the behavior of the user’s body joints, such as movement and extensions. For example, at \(n = 1\), the system determined \(D_\mathrm{T}\) as 55 cm, which was rated as “too far” by the user. At \(n = 11\), the user rated \(D_\mathrm{T}\) of 56 cm as “too close”. However, in the latter instance, the body joints of the user extended considerably than the earlier instance (\(D = 54\) at \(n = 11\) and \(D = 38\) at \(n = 1\)). This higher extension of the body joint from the center of the user might have made the user feel that the robot came too closer even though the proxemics distance was higher than that of \(n = 1\). For example, at \(n = 11\), the user may have extended his legs from the chair while sitting in a leaning posture where the robot came too close to the extended leg. Please note that \(D_\mathrm{T}\) is measured from the center of a user and not from extended body parts. Therefore, this sort of feedback variation can be expected.

Furthermore, feedback given for a particular activity is not localized for that specific activity, and all the triggered fuzzy sets of the fuzzy neural network are modified. These modifications can result in adapting the proxemics related to any other activity where the inputs (i.e., D and theta) are within more or less the same ranges. In addition to that, the same activity could be performed in widely varied ways. Therefore, if the system had been developed to stop the adaptation after receiving an ‘ok’ once, the system would not have been appropriately adapted. Moreover, feedback reinforces the adaptation where the adaptation relies on multiple feedback on heterogenous physical user behavior. The proxemics adaptation might be jeopardized due to inconsistent feedback where the model would not be converged and would require a substantial number of interactions for the convergence. As future work, it is expected to explore methods to identify inconsistent user feedback to resolve the issues arising from inconsistent user feedback. For example, developing a method that requests confirmation from a user in an instance of unreliable feedback would be an interesting potential future work. In addition, the current system considers explicit vocal cues as feedback. Nevertheless, the requirement of explicit voice feedback causes overhead for users. In the future, it is expected to extend the system to use sub-conscious body signals of users, such as facial expression changes, as feedback instead of explicit vocal cues.

The inability to accurately perceive a user by Kinect sensor in cases of occultation is another limitation of the proposed system. Inaccurate perceiving of a user might lead to the determination of incorrect proxemics and the reference user location. By incorporating a sensor fusion method to perceive users would be helpful in resolving this issue. Furthermore, users might experience uncomfortable proxemics during the learning process (until the robot adapts its proxemics based on feedback received in interactions). The learning process based on feedback is used to fine-tune the proxemics per user. Therefore, the proxemics determined by the robot in the initial stage would not cause much distress to users. Ways for enhancing the method for improving the learning rate should be explored to minimize this discomfort.

5 Conclusions

A novel method has been proposed to adapt the interpersonal distance at the termination position of an approach toward a user based on current physical user behavior and user feedback. The major improvement of the proposed work over the current state of the art is that the system can determine the approaching proxemics based on dynamic body movements while learning from the corrective user feedback.

The Fuzzy Proxemics Evaluation Module (FPEM) has been implemented with a fuzzy neural network that is capable of learning preferences of users based on user feedback for prior approaches. The fuzzy neural network perceives user behavior by means of dynamic parameters of skeletal joints to determine the appropriate termination position of an approach. This determination can be adapted toward the preferences of a user by modifying the internal parameters of the neural network based on user feedback.

Experiments have been conducted to compare and contrast the performance of the proposed system over a system that is not capable of adapting the proxemics based on prior experience. According to the outcomes of the experiments, the system that is capable of learning preferences of users improves the satisfaction of users regarding proxemics evaluation.