1 Introduction

Service robots are typically pictured as machines able to coexist with people to provide cooperation and assistance appropriately [27]. However, in the last decade, when dealing with possible social effects of robots’ behaviours, in terms of their impact on privacy, acceptance, trust and social bound, the mainstream of research focuses on the development and evaluation of robots’ capabilities in the context of direct, face-to-face, interactions with human beings [94]. This is particularly relevant in the case of social robots that are directly intended as entities capable of engaging users in meaningful interactions but also co-existing and performing tasks in the presence of other members of the community in a natural and socially acceptable way [104].

Fig. 1
figure 1

Example of non-interactive tasks: (a) vacuum cleaning; (b) remote monitoring; (c) pick and place of objects/delivery; (d) navigation in public spaces

In this context, the key domains are environments and possible application areas that can be defined as “social” [87], such as private homes, shops, public places, hospitals, care homes, and working environments. Nevertheless, within these environments a service robot’s task may not necessarily require direct interaction with people.

We can easily imagine scenarios in which a robot is involved in some tasks while the user is busy doing his/her activities. For example, a vacuum cleaner robot operating in a private house while the user is watching TV, or a logistic robot moving packages in a human-populated working environment (see Fig. 1 for some examples). Such robots have occasional encounters with people, although this is not their primary task. To give another example, a home companion robot assisting an older person will socially interact with the person to provide cognitive, physical, social, or health-related assistance, but might also have other tasks, such as monitoring the environment. Even in situations when the robot has no particular task to perform at a given time, it will share the space with people. Thus, its behaviour during those periods may influence people’s perception and attitude towards the robot. In particular, sharing human personal spaces raises new issues concerning privacy, and how this might be violated. This is not only by the possible share of sensitive information but also by the intrusion in their personal life and private space. Indeed, a violation of privacy could involve information (e.g., personal information), physical (e.g., personal space, modesty), psychological (e.g., thoughts, values), and social (e.g., intimacy) [102]. To perform their tasks, service robots might need to acquire a wide range of information about the people they share the environment with [25, 109]. They might also need to store information of the home spaces and housekeeping styles to move autonomously in the house [107]. However, people’s awareness of using robots that have sensing capabilities able to create an accurate picture of the private environment and life (e.g. with cameras and microphones) negatively affect their acceptance of robots [101].

The ability of a robot to adapt its behaviour according to social expectations [30], specific cultural norms [14], and possible individual preferences [57], will determine the success and large scale use of such service robotics application [93], and also social robots.

Moreover, in complex collaborative environments (whether humans and robots share tasks, or just the same space and resources), indirect multi-modal communication has a key role in enhancing the effectiveness of human–robot interaction (HRI). Indeed, the possibility of different communication modes might differently affect people’s perception of a robot and their cognitive workload [1]. In the past few years, these aspects have been mainly considered while planning robots’ trajectories and paths. However, any action and behaviour of a robot, whether in an interactive or non-interactive situation, should always be perceived as socially acceptable by human users or other non-users inhabiting the same environment. According to the definition of a robot companion provided in [27], a robot should not just perform a task correctly and efficiently. Since people and robots share the same physical environment, the robot should perform such tasks in a manner that is believable, legible, and acceptable to people. This is also true in the case of non-interactive tasks that will constitute the majority of a service robot’s actions in the case of a fully day-to-day human–robot coexistence. Therefore, we argue that the social robotics community has to provide a relevant contribution to model and design robots’ socially acceptable behaviours even in situations when a robot is not directly interacting with people. This will allow the development of service robotics applications that can be fully deployed in everyday life.

This article does not intend to provide a systematic state-of-the-art review of approaches regarding the implementation of socially acceptable behaviours. Instead, it aims to discuss the main research objectives related to the effective development of robots’ social abilities in tasks that do not involve direct interaction with a person. Any roboticist who wants to develop and test social applications that go beyond a remote-controlled mode, Wizard of OZ (WoZ), is aware that these require capabilities to perceive more complex situations. HRI and Social Robotics investigations are strongly influenced by different disciplines such as machine learning, distributed sensing, and software engineering. However, the role that social robotics, as a discipline, will play in the near future towards the effective development of service robots is still to be concreted. In this work, we will briefly introduce what has already been accomplished to effectively support the design of socially acceptable behaviours during non-interactive tasks in the past ten years. We will highlight some of the related challenges that will be addressed in the next ten years, and how the social robotics community can play a leading role.

Firstly, either in working environments or in private houses, robots have to be aware of their inhabitants and of the social context of the environments they are acting in. Some human inhabitants might be ‘users’, but the robot may also have encounters with other people who might only be present during short periods while just sharing the same environment. Such context awareness is typically defined in terms of cultural norms [20], social signals [113], and individual preferences of its inhabitants [94]. The presence of human beings makes the context extremely dynamic and mainly shaped by human activities. In Sect. 2, we introduce the current perspectives concerning the necessity to obtain the proper Human and Context-Awareness. Moreover, since socially enhanced robot behaviours require us to consider the human’s preferences and to incorporate situation assessment into the robot’s decision-making process [38], in Sect. 3, Socially Acceptable Decision-Making is discussed. Such decision-making activity is presented in terms of a planning process. It focuses on the proper selection of actions required to achieve a goal efficiently but also on the characteristics of plans that are perceived as socially acceptable. Finally, in Sect. 4, we discuss issues related to indirect Information Exchange, since, even in the case of a non-interactive task, the robot’s behaviours may cause a sort of indirect communication towards people. This has an impact on the legibility and predictability of such behaviours and on the acceptance of a robot itself. In Sect. 5, we summarise the key aspects to design social robot’s behaviours and point out the remaining challenges that are posed for the social robotics community.

2 Human and Context Awareness

To effectively exploit autonomous capabilities that are socially enhanced, a robot is required to sense its environment but also to understand what happens within it [28]. Situation or context awareness is an established concept linked to research in Ubiquitous Computing (UB). In UB, devices are distributed in the environment to sense and interpret the current activities of its inhabitants. Many service robot applications have been developed relying on such data [17, 21, 35]. As an example, in public spaces robots may require to know the humans’ positions to track or to avoid them [42]. Hence, a robot should have the ability to sense and elaborate information regarding contexts to decide what to do, to predict future situations, and to adapt its behaviour. Koay et al. [61] identified different levels of situation awareness for a robot as mediated by the ubiquitous devices: Physical Context—related to physical properties of the environment and that can be directly measured from sensors; User Context—in terms of activities, locations, and preferences; and Robot Context—related to the activities of the robot itself. In the case of non-interactive tasks, physical and robot context are commonly required to properly plan and act to achieve its goals. User context, in terms of human and situational awareness, is the one required to interact with a person, but also to socially enhance the robot’s behaviour while not interacting.

2.1 Human Awareness

Robots that operate in human environments require the ability to sense people and recognise their activities during both interactive and non-interactive tasks. In applications of human–robot interaction, the robot’s perception capabilities depend on the proximity of the interactions. In remote interactions, a user is typically expected to remotely control a robot (even in shared autonomy configurations). Hence, they potentially share a common awareness of the external environment. In a proximate interaction, close human–robot distances help in providing reliable images and data to be analysed by the robot to obtain situational awareness. However, the human and the robot’s points of view can differ significantly during non-interactive tasks. The human may not be in the field of view of the robot. The robot may be acting in a different context to the person’s one. Robots and humans might not even be in the same room. Indeed, while datasets used in typical HRI studies are recorded assuming a face-to-face interaction [120], the robot may not be aware of the person’s location in non-interactive tasks. This poses significant challenges to be addressed to obtain the proper situation-awareness, and, especially, in the absence of external sensors that could potentially enhance a robot’s perceptual abilities (e.g., in a smart home context).

People detection and tracking are well-covered research areas. However, in this case, the most common approach is to install a variety of sensors in the home environment to track the person [42]. In the literature, few approaches are presented for searching the human relying only on a robotic device with onboard-only sensors [82]. Algorithms for human tracking are deployed to detect and track people that are already in the proximity of the robot. They do not consider that their relative movements and limited field of view can make the person easily lost [99].

Finally, human awareness not only concerns the acknowledgement of the person’s position and pose within the environment but also the opportunity to understand the activity that the person is currently performing. For example, robotic personal assistance may be required to recognise the user’s Activities of Daily Living (eating, drinking, cooking, watching TV, using a mobile phone, etc.) or emergencies, such as fall detection [43]. Also for this case, approaches in the literature either consider the availability of external sensors [100] or assume a closed position of the robot with respect to the user [63].

2.2 Social and Non-verbal Signals

Service robots’ applications are typically deployed to execute autonomous behaviours. They are controlled by AI without needing instruction or human help [9]. They behave always in the same way, regardless of humans’ reactions [53]. On the contrary, the importance of interpreting and recognising social and non-verbal signals during the interaction is generally well recognised within the social robotics community [40, 77]. These play a fundamental role also during non-interactive tasks. For example, the interpretation of non-verbal cues, such as gaze, posture, and back-channels, can be used in the recognition of the person’s engagement during an interaction, the same could be used to evaluate the person’s discomfort or the disengagement from the current activity caused by the robot’s behaviour in the shared environment [96]. A robot not interacting with a person, but performing other tasks autonomously, or just charging its batteries in a dedicated charging area, might influence people’s activities and their attitudes towards the robot: e.g., people might find the robot’s behaviours ‘distracting’, ‘annoying’, or ‘boring’.

Non-verbal cues may also be used to identify possible conflicting goals between a person and a robot, e.g., the human may be busy and may not be willing to interact while the robot may be seeking interaction, e.g. to offer assistance, and vice versa [46, 105]. It is important to detect if a person is willing to interact with the robot or not [47]. Moreover, the proper evaluation of the person’s “interruptability” affects the person’s performance in his/her current activity, as well as the performance of the robot’s task itself (for example in terms of time spent waiting for the person to be ready to interact), and it will influence the robot’s acceptance [6]. In this last case, one may assume a closer and frontal position of the person with respect to the robot, e.g. the person is within a “participation” or interaction zone [111]. For example, in the case of a bartender robot engaging with people to sell drinks [47], the problem is to infer the person’s disposition to interact from social cues, typically gaze and pose. In other cases, the difficulties of getting data on such cues can be significant. While robot and person may be moving around, distances may prohibit the correct identification of such features.

Moreover, the proper recognition of a social human–human interaction setting is also a key property in shaping the behaviour of a robot. People that are already in a social state may be more willing to be interrupted by a robot [46].

2.3 Intention Recognition and Situational Assessment

The recognition of situational context relies on the perception of the physical and the social environment in terms of observable behaviours and social events [87]. The simple recognition of whether or not a person is willing to interact with a robot is a type of intention recognition [85]. Indeed, the recognition of complex activities and intentions requires the ability to track a temporal pattern of actions and infer the final goal. That is typically achieved by deploying probabilistic models [79] or symbolic reasoning [81]. Therefore, the intent and situational recognition rely strongly on the availability of distributed and complex perception abilities.

The correct recognition of a person’s activity is also fundamental for anticipation [62] and to achieve proactive behaviour. The ability to infer the person’s goal, from the observation of his/her actions, allows the robot to proactively act to help (if needed) or to avoid to intrude. Beyond pro-activity, the possibility of a correct interpretation of the person’s intentions also allows to properly put constraints on the future robot’s actions and to aid in the definition of the proper context for decision-making.

2.4 Open Challenges

For social robots to be fully integrated into human environments, unique perceptual challenges are arising. The new development of sensors, as well as the reliability of powerful machine learning techniques to analyse data, have facilitated substantial advances in machine perception in the last decade. However, the automatic interpretation of human non-verbal cues, characterising the person’s state, is still very challenging since it involves the simultaneous analysis of different elements from human observation [94]. Indeed, non-verbal cues are the most used channel of communication. They can also be used by a robot to infer intentions and to assess the acceptability, comfort, and likeability of the human–robot co-existence. To be able to interpret such signals, models of how such cues relate to each other and contribute to the multi-modal recognition of social signals, as, for example, the willingness of a person to interact with it, are necessary. Moreover, it is not only the ability to recognise such cues but, more importantly, in the case of non-interactive tasks, to be able to recognise the absence of such cues. For example, the absence of the specific cue could represent a lack of interest of the person toward the robot [46].

When tackling challenging problems in robot perception, considering the static properties of the context can constitute an advantage by reducing the current space of possibilities [20]. For example, in the absence of robust detectors for a person’s state, context awareness may play a fundamental role [6]. Moreover, the concept of ubiquitous robots may help overcome the current limitation in standalone robot perception. It can provide the possibility of integration of the robot data with the information provided by different services running in the smart environment [21]. The Internet of Robotic Things is in the direction of the seamless integration of a robotic device within ambient intelligence. However, it requires a substantial effort in abstracting the robot functionalities, and in designing and providing them as plug and play applications for interoperability [35] while relying only on widely used common middle-ware, e.g. the Robot Operating SystemFootnote 1.

3 Socially Acceptable Decision-Making

A robot’s goals should be achieved by following models of socially acceptable behaviours. For example, a robot that wants to get into an elevator, where there are already people inside, should firstly wait for people to exit the elevator. Once the people left the elevator, the robot should proceed to enter [13]. As already discussed in Sect. 2, the robot should be able to infer the activities and goals of people in addition to merely detecting their presence. In the same way, humans should also be able to understand robots’ actions and to infer their goals (see Sect. 4). Studies showed that robot actions are perceived similarly to human actions [124], by affecting the way such actions are performed and planned.

Hence, when planning a sequence of actions in a shared environment, the robot should consider that its own goal could interfere or be in conflict with the humans’ goals [13]. For example, the person may not want to be interrupted and distracted while involved in cognitively demanding tasks, or when she would not want to hear the robot’s movement noise once relaxing. A socially intelligent vacuum cleaner should effectively plan its cleaning tasks taking into account the person’s habits [36] in order not to cause discomfort. It is important to take into account human activities even in the case that such activities are not related to the robot’s ones.

3.1 Socially-Aware Navigation

In the literature, different approaches have dealt with human-aware adaptation in planning by mainly considering the robot’s trajectories and paths. In a human-populated environment, a navigation task should not only be achieved considering the efficiency, but rather reaching a trade-off between efficiency or performance and human acceptance [78, 91]. This has a strong impact when planning trajectories in such shared spaces. For example, in a user monitoring task, the robot should keep the proper distances and approach direction to have good recognition performance. Moreover, it has to pay attention not to disturb people who are involved in other activities [91]. Hence, people’s comfort, in terms of distances, speed, direction, and social rules, has to be considered while planning the trajectory to accomplish a goal [66, 93].

In this field, approaches typically differentiate between path planning in the case of people freely walking around or a robot navigating in a space that is statically populated by people. In the latter case, a typical example would be navigation in a public space where there can be some possible conflict between the robot and people moving [10]. In this case, reactive strategies are deployed to modify the robot’s trajectory with respect to the people, so that the moving people are not treated simply as moving obstacles, but as human and social beings. This requires a robot able to plan socially acceptable behaviours so as to maintain social distances [110]. Prediction strategies are also integrated into the planning activity allowing the robot to perceive the intention and to predict the behaviour of people moving around [10, 68]. Moreover, social rules, such as walking on the right/left side of a corridor, are considered in the robot path generation [58]. In the case of people with a static pose, the robot path planning is conducted by considering sophisticated cost functions or potential fields that take into account the person’s comfort [19]. In these cases, strategies to minimise the probability of encounters with people are developed using affordance maps. These model the temporal behaviour of the people in the environment, again by using a cost model [36].

3.2 Human Aware Task Planning

Concerning path planning, socially-aware action and task planners require that the actions and the activities of other cohabitants (humans or robots) are taken into account [23]. It also involves reasoning carefully about human-aware capabilities, the mental, and physical states of the human partners [69].

Human-aware task planning is mainly developed in the context of planning a course of action for a robot that has to collaborate with a human being [34], taking into account the human mental state characteristics, such as possible beliefs and plans, and the inattention states [33]. The capability of inferring and recognising the individuals’ intentions, desires, and beliefs, as well as their internal states, personality, and emotions, is often referred to as Theory of Mind (ToM). ToM refers to being able to acknowledge and understand the mental states of other people. Moreover, it requires to use the judgement of their mental state to predict their behaviour and accordingly adapt their own decision-making process. In the context of perspective-taking approaches [84], the robots’ reasoning process focused on recognising what the human partner can perceive or not. Consequently, it is used to construct a world model for the planning process from the point of view of the user to reason and decide the current actions. Indeed, these approaches are mainly considered in the case of human–robot explicit interaction and collaborations, while they could also play a fundamental role in planning for non-interactive tasks.

The relationship between task planner and motion planners is commonly defined only in terms of re-planning once the motion planner fails to achieve its goal. Some approaches explicitly combine task and motion planning, recognising that social awareness requires to take into account the task planner. In detail, it is important to consider how the planned actions affect the social context [87], for example, in the case of non-interactive or collaborative tasks, bearing in mind not to interrupt or distract the human. In this context, it is fundamental to take into consideration the necessity of not violating the social expectations of people [87], but also to forecast the possibility of considering the human inferences (e.g., possible goals and actions) in the robot planning so to provide legible and predictable motions [37].

In general, the robot’s plan should be created by considering the human’s activities and plans to achieve a given goal without violating possible constraints generated by people’s activities [23]. Hence, the temporal properties of a sequence of actions have to be planned according to human’s current or expected activities [17, 35]. This could be achieved by relying on the seamless integration of the robot’s planner with data from distributed sensors and services assessing the context. Social norms can also be defined explicitly and considered during the planning process, thus generating social plans integrating domain descriptions and norms [16]. The same also holds in the case of culture-based norms that can influence the generation of plans [15]. Gender and other individual differences are additional variables to consider in this process. Nevertheless, such norms are typically deployed in applications requiring direct human–robot interaction with little or no consideration of non-interactive situations.

3.3 Cognitive Architectures for Robot Control

Beyond planning capabilities, the general ideas of Cognitive Architectures include the possibility of deploying different capabilities for robot control. These are, for example, mechanisms for perception, attention, action selection, memory, learning, and meta-cognition.

For being able to exploit correct reasoning on perceptual data, some authors argue that machine learning techniques, alone, may not be sufficiently adequate to support personalisation and adaptation. Such characteristics are required to improve robot acceptance, especially in the case of assistive scenarios. An effective assistive system with a high degree of user acceptance should be based on the knowledge of the potential users, as well as on contextual and environmental information. This is essential to automatically generate assistive plans and actions tailored to the specific user to be executed by the robot [35]. Furthermore, these approaches would entail a lack of control and explainability especially when unexpected or not predictable behaviours occur. In this direction, cognitive architecture could take advantage of reasoning on a symbolic knowledge of the human users and the current situation as extracted from complex sensor data [35, 118]. According to [74], direct and explicit integration of cognition is a compulsory step to enable human–robots interaction in semantic-rich human environments like houses. The necessity of having a semantic representation to reason about tasks and the environment was also highlighted in [118]. In particular, this is necessary for multi-modal semantic information exchange (see Sect. 4) involving verbal, body movements, and graphical communication forms [50]. However, the extensive formalisation of a semantic system to be used in a complex multi-modal setting remains an open challenge [24].

Finally, meta-cognition is necessary for social cognition [64]. As we already said in Sect. 3.2, as robots needs to recognise human behaviour and intentions, they should also be able to use such knowledge in the decision-making process. Very few architectures support this ability [64].

3.4 Open Challenges

Navigating into an environment populated by humans requires to take into account a complex system that goes beyond the modelling of human–robot proximity, and that combines obstacle avoidance and natural motion with the possibility of not causing discomfort. Works on human-aware path planning capabilities still mainly focus on safety and avoidance behaviours. While different path planning models are proposed in the literature for dealing with personal spaces, less attention has been given to scenarios involving multiple people [5, 116]. Typically concepts of O-Space and F-formations are used to detect groups of people involved in a joint activity/conversation [26], but the dynamics of mutual relationships obtained through social signal processing may allow reasoning on more complex behavioural patterns. Moreover, the development of proper strategies for path planning has to take into account that the person might be already engaged in an activity, and needs to consider his/her personal spaces [61]. Rios-Martinez [93] highlighted that the concepts of activity and affordance spaces, dynamically defined by the activity the person is performing, have to be considered. Shaping a robot’s task in this manner requires complex situation-awareness capabilities to be effective when in the presence of a person.

Indeed, during a navigation task, a robot needs to consider human activities, plans and goals, preferences, groups, and their interaction with the objects in the environment. These considerations go beyond general path planning since also the planning of the robots’ actions need to take into account the user state, with ToM approaches, or his/her current goal and activities. This goes in the direction of reasoning on possible social norms and preferences to deploy socially-aware planners. Socially enhanced planning abilities will require more and more integration of planning with contextual information, but also the ability to integrate a human’s feedback into the planning loop. Moreover, the perception of the person’s nonverbal behaviour may provide useful feedback to evaluate the possible discomfort caused by the robot [96]. In this direction, online learning strategies exploiting the implicit human feedback may provide a way to adapt the navigation to the social characteristics of the environment [4, 91]. However, such approaches do not include the possibility of reasoning about specific social norms because they may fail in the provision of the person’s explicit preferences and lack a mechanism for easy explainability of the robot’s behaviour to people (see Sect. 4).

Finally, studies that aim to evaluate the person’s comfort during human-aware path planning are either conducted in a Wizard of Oz (WoZ) mode [96], or require complete knowledge of the environment and the position of the persons [75]. To be fully deployed, these methods need the support of additional sensors to provide such knowledge, that are not always effective due to the limited perceptual capabilities of the robot’s field of view. To increase social autonomy in the robot’s behaviour, robots need an effective augmentation of their perceptual and reasoning abilities in a smart environment.

4 Information Exchange

Any verbal or non-verbal behaviour of the robot, even when not directly interacting with a person, can be considered a form of interaction [66]. Indeed, if the robot’s behaviour creates discomfort, distraction, or has any impact on the person’s behaviour and the acceptance of the robot, it can be considered as a form of interaction. People are able to communicate and interpret communication signals that go beyond natural language and may involve gesture, pose, and body language. In addition to those, they might engage other humans with a bidirectional and mutual understanding [108] that enables them to anticipate and read implicit intentions and behaviours. Moreover, such indirect information exchange can be influenced by human social conventions, i.e., simple habits of social interaction, expectations, and perception of robots.

Multi-modal cues are perceived by people as a better communication mode [54], and they improve the effectiveness of the information exchange [121]. This applies to robots both with and without facial features. In the first case, several models proposed the integration of head and eye-gaze movements based on prosodic features extracted from speech [71] aiming to more intuitive and natural communications. In the second, speech and co-speech gestures (e.g. hand gesture, pointing) have been used to direct people’s attention to specific objects in the surrounding environment [39]. The main limitations in the currently developed systems are related to the simulation of the human naturalness in a robot’s movements, due to a lack of fine control over motors, and the lack of deep knowledge of how humans produce and comprehend multi-modal language [50].

4.1 Modelling Social Behaviour

Kruse et al. [66] distinguished between explicit and implicit interaction. Explicit interaction is when the robot’s actions are directly planned to induce a reaction in the human. Implicit interaction is when such a reaction is not intended, but it is a side-effect of the robot’s behaviour that is indeed planned to take into account the person’s presence. Nevertheless, certain robots’ properties drive humans to anthropomorphise its behaviours, motions, and tasks [41, 126]. For example, a robot might be perceived as confident and to have natural movements, whether or not it was planned to express personality traits and natural behaviours. Moreover, several studies have shown that modelling the timing and velocity of a robot’s motion might change people’s perception of the robot [125]. For example, changes in the velocity of the robot’s motion can elicit emotions [103], intent [48], arousal and dominance [103].

Different robot non-verbal behaviours, when it is not engaged in any interaction, were studied to evaluate if the person would notice changes in nonverbal cues and whether the robot’s behaviour would change the participant’s perception of the robot itself [73]. Here, a robot watched a human performing a task and synchronised its behaviour with the human’s behaviour in either positive or negative directions, through expressing gaze behaviour. Although gaze patterns are often used to achieve a collaborative human–robot interaction [55], these also play a fundamental role during non-interactive tasks. Indeed, it has been observed that people respond to a situation where a companion robot is following the person’s actions using head gaze [73]. The robot’s behaviour, compared to a non-moving robot, created a positive disposition towards the robot.

4.2 Transparency, Legibility, and Predictability

Communication within social environments is usually bi-directional; therefore, it is important that both robots and people are able to understand and predict behaviours during their interactions. In particular, a robot’s behaviour is defined as legible if humans can understand the robot’s intentions and if it meets their expectations [18]. A robot’s behaviour is predictable when people are able to predict its actions, motion trajectories and goals [12, 44]. Since the legibility and predictability of the robot’s behaviours are both essential in building people’s trust in robots and, consequently, improving the quality of the interaction, they could be interpreted as necessary to improve transparency [2]. On the contrary, a robot that is not able to communicate its own state may create anxiety in humans [88] and may negatively affect human–robot co-existence [2].

Moreover, the perception of the usefulness of a robot’s capabilities affects people’s trust in robots [52]. Trust has a key role in any people’s acceptance of a robot and their willingness to use them [80]. Several studies [3, 98] suggested that greater shared awareness among the agents involved, their activities, and situations occurring between humans and robots enhances individuals’ trust in robots. Indeed, trust is also affected by other factors [22, 95], such as failure rates and transparency.

4.2.1 Transparency in Robot’s Motion

Transparency of the robot’s intention and behaviours facilitates communication, both direct and indirect, and provides humans with the opportunity to understand the potential implications, risks, and benefits of using service robots [8].

When moving around in an environment, the ability to create natural motion has been shown to increase the robot’s acceptability and its perceived safety [76]. A motion that is more similar to human expectations is more legible and, thus, easier to be interpreted by the human being observing the robot’s behaviour [67, 117]. The naturalness of movements, modelled varying a robot’s velocity and orientation, contributes to the perceived autonomy of the robot [89].

In a navigation task, motions of the robot may indicate a goal direction and the robot can visibly acknowledge the presence and status of obstacles and humans [67]. While autonomous cars and some proposed robot applications [7] use lights and LEDs to signal change of directions and motion to people, more natural approaches include head movements and body orientation to express the intention of the action [83]. Such approaches enhance the perceptions of the naturalness of the robot’s behaviour. Moreover, higher acceptance and sense of comfort of robots navigating in human-populated environments can be obtained not only by introducing social concepts, such as social distance or proxemics. It can be obtained also by using other social conventions, such as directly addressing the people to solve the encountered issue [97]. For example, in an environment cluttered by people, a robot behaviour that slows down and asks people politely to make room for the robot to pass might be perceived positively [97].

Intention recognition from the robot motion is also crucial in applications different from the navigation. For example, movements such as gaze and hand gestures could be fundamental for helping a human in recognising the objects of interest for the robot and, consequently, its goal. [119, 122]. Moreover, animal-inspired movements and cues can also be used to communicate the robot’s behaviours in the case of different embodiment features [70].

4.3 Open Challenges

People’s prior experiences and learned associations affect their perception of robots. However, the general population’s experiences with robots are often limited to fictional movies and stories. People are also beginning to get used to interact with virtual interactive conversational systems, such as the Amazon Alexa seriesFootnote 2. The familiarity with these technologies might affect their expectations while sharing environments with robots. Nevertheless, the current state of robotics research is still far away from robots portrayed in science fiction [65], and anthropomorphic robots do not yet fully meet people’s expectations.

People’s expectations may also vary according to the role of the robot (e.g. supervisor, subordinate, peer) and the relationship in the interaction with people, which also shapes their expectations when not interacting with robots. For example, people might be more willing to accept and trust a robot when it plays an authority figure, such as a food delivery robot compared to an anonymous robot [11]. However, since the type and length of the relationships, i.e. long-term relationships, between humans affect their trust and acceptance [106], we expect that the social dynamics between robots and humans might similarly be impacted by repeated interactions. Relationships can also change over time, for example, due to a breach of trust, and it is also important to consider correct and incorrect actions/behaviours of the robot and the humans, and how these impact on their relationships.

It is important to consider what kind of robot’s communication modes and feedback is more useful, clear, and better accepted by individuals to be legible. Assuming that natural language and gestures are the most common way of communication used by humans, it is not clear how to deal with this issue during the non-interactive tasks. This also needs to consider that humans can identify different implicit communication signals, as, for example, dog-inspired ones [60, 115]. Moreover, it is not yet clear how a robot should acknowledge that a person’s intention has been recognised and taken into account in its actions, and how to model uncertainty in the understanding of the human’s actions without direct communication. Similarly, the robot should be confident that the human understands and predicts its intentions correctly. The key for successful coexistence between people and robots should consider a robotic system that can perform legible motion and intent recognition autonomously and in real-time during interaction with people.

Moreover, group dynamics can change humans’ perception and robot acceptance, in cases where multiple robots and/or multiple humans are present. Social interactions in the presence of groups are still not sufficiently explored in HRI.

Fig. 2
figure 2

From service robots to social entities: main research objectives as identified in this paper for effectively deploying a service robot that can perform tasks that do not require a direct interaction with a person

5 Discussion and Conclusions

In the last decade, the service robotics community focused on developing skills and capabilities to make robots autonomously performing tasks on behalf of the user and to simulate human-like intelligence in machines [31, 86, 90, 112]. Nowadays, robots are able to plan, navigate, manipulate objects, and reason on the properties of environment. However, all these functionalities are efficiently deployed in static and dynamic non-social environments. Few considerations have been made concerning the humans and workers that, even if not directly interacting with the robots, may be sharing the same space with them. In these environments, human beings are mainly considered to guarantee their safety. The presence of human beings is considered only in the case of a direct, face-to-face interaction. HRI studies aim at making this interaction as natural as possible, also considering the social aspects of the communication. However, service robotics applications are expected to accomplish many different tasks that do not require interaction with or receiving instruction by the user. These are typically achieved without taking into account the possible reactions of human beings. In this context, even for a vacuum cleaner, we claim that the development of socially-aware behaviours and actions is deemed important.

In this work, we identified some of the principal avenues that can contribute toward this research direction. These are summarised in Fig. 2 where we show how they constitute a path leading to improved user acceptance and, consequently, the market for social robots. First, a robot, to be able to autonomously run in real environments, has to be able to understand the person’s social needs. This can be done by taking into account the users’ profiles and enhancing the robots’ perceptual capabilities using external devices, augmenting the perception possibilities, but also by external services for faster and reliable integrated computation. As the perceptual requirements need more of a software engineering approach to be fully realised, the possibility of obtaining a situational assessment from the indirect observation of the person’s behaviour is of fundamental importance. More models combining the consideration of the presence and absence of different multi-modal features are required to properly assess the social context. Non-verbal cues and social signals are defining the social situational context of the environment the robot is operating in. With respect to a close interaction, the proper perception and recognition of such signals (or their absence) require the augmentation of the robot’s perceptual capabilities with ambient services providing such information. Moreover, to obtain the properties that describe such models, more observational studies are required.

In future, we will see more and more robotic applications that will rely on the concept of the Internet of Robotic Things. This will also allow the research community in deploying more “in the wild” experimentation. Robotic applications, in research, are usually developed and tested in laboratories or controlled environments (where such extended perception is already available). The technological progress and advancements do not yet allow deployment for real-world applications and long-term use of such systems. There are already examples of long-term fully autonomous co-existences of robots and humans that exceed a few weeks (of interaction tasks) [29, 56]. Long-term experiments lead to observations of human behaviours that go beyond the initial novelty effect and that have to be considered in the design process. We may see protection of the robot, mimicry, social comparison, and even jealousy [72] and that will heavily depend on the robot’s social awareness and behaviours. For example, over many weeks, people may form a closer relationship, a sort of alliance [59]. In contrast, the disenchantment, end of the novelty, the robots’ restrictions and errors, and lack of realistic expectations can induce people to stop using a robot and instead replacing it with other devices [32].

In the scenario of long-term cooperation with no intervention from technical personnel, robots need to observe their users and other people sharing the same environment, but also learn from, and adapt to them. Individuals’ differences are crucially important when designing ad-hoc long-lasting HRIs. In particular, a robot’s self-adaptation should satisfy people’s needs, in terms of cognition, personality and emotional characteristics, preferences, and habits. To ensure the social acceptance of a robot, several factors that affect humans’ perception of robots will require further consideration [30]. A robot’s decision-making process that has both reactive and predictive strategies taking into account humans’ activities and intentions, but also their indirect feedback, plays a key role in supporting viable long-term human–robot interaction. In this direction, the human-aware path planning research area has already started to take into account that human acceptance of the robot’s behaviour (while moving around in the environment) is a fundamental step, even when it might come at the cost of performance reduction. Such considerations have to be also extended to task planning. Perception, reasoning, and planning have to take into account also social and cultural norms and the possible dynamics among multiple interacting people.

Adaptation to preferences can be achieved by the use of cognitive architectures for the robot control that include ToM (Theory of Mind) models and meta-cognition capabilities to reason about social norms and acceptable behaviours. When planning the course of action, a robot should consider the current state of the human beings populating the environment. This has an impact on the selection of the proper actions but also on the way such actions have to be executed. The combination of machine learning techniques, to identify the relevant contextual characteristics and possible user’s reactions, with the reasoning capabilities will foster also the possibility of achieving explainable behaviour.

In the same way, a robot’s actions provide indirect information to the surrounding people expressed through its behaviour and social cues. The social attitude in robots’ behaviours and actions influence the person’s expectations and perception of privacy, safety, and reliability of a robot. Although legibility and predictability of a robot’s behaviours influence positively the interaction between humans and robots, the development of mechanisms for legible communication of the robot’s intentions and the interpretation of humans’ actions are not easy tasks. In the end, since the legibility and predictability of the behaviour are both essential in building people’s trust in robots, these are fundamental issues to properly address the market demand for social robots.

To evaluate the impact of all these features on the user’s acceptance, proper metrics have to be defined. In the literature, a set of task-specific metrics [114] (e.g., for navigation and manipulation), as well as metrics related to human–robot collaboration and interaction [123], or safety perception [45, 92] are considered. The same holds in the case of shared autonomy [49]. Moreover, when evaluating user perception of changes in robot behaviour, most studies exclusively rely on the use of questionnaires. Only a few studies attempted to reflect users’ emotional responses to the robot’s motions using some physiological measures [51]. Indeed, the same social cues required to achieve human and situational awareness could be used for defining indirect measurements of the service robot’s acceptance. However, there is no clear consensus on evaluation metrics to be used in the development of socially-aware service robotics, since they also require a trade-off between performance and acceptance.

In general terms, we conclude that socially-aware service robotic systems should catch the focus of researchers to build a fruitful and successful co-existence between humans and people. The concept of a possible trade-off between the robot’s performance in accomplishing its goals, and the consideration of the social environments, in terms of humans’ safety, acceptability, comfort, and trust, will play a central role for the mature development of service and personal robots. Consequently, these robots will be deployed with greater success in various markets. In order to succeed in this intent, the robot’s capabilities have to be socially-enhanced even at the expense of its performance. “Social intelligence”, beyond direct human–robot interactions, is what it is missing. Hence, the social robotics community needs to play a central role in leading the development of such service robot applications.