Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Overview

The way a person interacts with a social robot (or sociable robot ) is quite different from interacting with the majority of autonomous mobile robots today. Modern autonomous robots are generally viewed as tools that human specialists use to perform hazardous tasks in remote environments (i. e., sweeping minefields, inspecting oil wells, mapping mines, etc.). In dramatic contrast, social (or sociable) robots are designed to engage people in an interpersonal manner, often as partners, in order to achieve positive outcomes in domains such as education, therapy, or health, or task-related goals in areas such as coordinated teamwork for manufacturing, search and rescue, domestic chores, and more.

The development of socially intelligent and socially skillful robots drives research to develop autonomous or semiautonomous robots that are natural and intuitive for the general public to interact with, communicate with, work with as partners, and teach new capabilities. Dautenhahn’s work is among the earliest in thinking about robots with interpersonal social intelligence where relationships between specific individuals are important [72.1, 72.2]. These early works pose the question:

What are the common social mechanisms of communication and understanding that can produce efficient, enjoyable, natural and meaningful interactions between humans and robots?

Promisingly, there have been initial and ongoing strides in all of these areas ([72.10, 72.11, 72.3, 72.4, 72.5, 72.6, 72.7, 72.8, 72.9], etc.). In addition, this domain motivates new questions for robotics researchers, such as how to design for a successful long-term relationship where the robot remains appealing and provides consistent benefit to people over weeks, months, and even years. The benefit that social robots provide people extends far beyond strict task-performing utility to include educational (Chap. 79), health and therapeutic (Chap. 64), domestic (Chap. 65), social and emotional goals (e. g., entertainment, companionship, communication, etc.), and more.

We begin this chapter with a brief overview of a wide assortment of embodiments of socially interactive robots that have been developed around the world (Sect. 72.2). We follow with selected topics that highlight some of the representative research themes: social-emotional intelligence and emotion-based interaction (Sect. 72.3 ), and social-cognitive skills (Sect. 72.4). Human social responses to robots(Sect. 72.5), verbal and non-verbal communication (Sect. 72.6 ), long-term interaction (Sect. 72.7 ), touch-based interaction (Sect. 72.8 ), and teamwork with robot partners (Sect. 72.9). We rely on examples from our own research programs to illustrate these trends, while making reference to other excellent works performed in other research labs.

2 Social Robot Embodiment

2.1 Anthropomorphic Design

Social robots are designed to interact with people in human-centric terms and to operate in human environments alongside people. Many social robots are humanoid or animal-like in form, although this does not have to be the case. A unifying characteristic is that that social robots engage people in an interpersonal manner, communicating and coordinating their behavior with humans through verbal, nonverbal, or affective modalities. People have a strong tendency to anthropomorphize social robots [72.12] and to reason about their behavior in terms of having their own mental states (e. g., thoughts, intents, beliefs, desires, etc.). Hence, anthropomorphic design principles, spanning from the physical appearance of robots, to how they move and behave, and how they interact with people, are often employed to facilitate interaction and acceptance.

2.2 Design Space

The design space of social robotics is quite large. It is important to note that a more human-like design does not necessarily correlate with a better design. One needs to balance the robot design with the task, user, and context [72.17]. As can be seen in the following examples, social robots exploit many different modalities to communicate and express social-emotional behavior. These include whole-body motion, proxemics (i. e., interpersonal distance), gestures, facial expressions, gaze behavior, head orientation, linguistic or emotive vocalization, touch-based communication, and an assortment of display technologies.

For social robots to close the communication loop and coordinate their behavior with people, they must also be able to perceive, interpret, and respond appropriately to verbal and nonverbal cues from humans. shows dog-inspired robot behaviour to faciliate intention reading by people. Given the richness of human behavior and the complexity of human environments, many social robots are among the most sophisticated, articulate, behaviorally rich, and intelligent robots today.

As shown in Fig. 72.1, a number of socially interactive humanoid robots have been developed (Chap. 65) that can participate in whole body social interaction with people such as dancing [72.22], walking hand-in-hand [72.23, 72.24], playing a musical duet [72.13], or transferring skills to unskilled persons [72.25], or collaborating as a team with people in search and retrieve tasks [72.16]. Their arms and hands are designed to exhibit human-like gestures such as pointing, shrugging shoulders, shaking hands, or giving a hug [72.26, 72.27, 72.28]. Some of them are designed with mechanical faces to communicate with humans via facial expressions [72.18, 72.20, 72.29].

Fig. 72.1a–c
figure 1

Examples of socially interactive humanoid robots: (a) Humanoid robots developed at Waseda University from left to right: a flutist robot WF-4RII (after [72.13]), WABIAN-2 (after [72.14]), and WE-4RII (after [72.15]); (b) Robovie developed at ATR (Advanced Telecommunications Research Institute, Kyoto) is able to gesture with its arms and give a hug; (c) Nexi and Maddox, developed at MIT, are mobile and dexterous social robots used to study collaborative human–robot teamwork (after [72.16])

Whereas many of these humanoids have a mechanical appearance, android robots are designed to have a very human-like appearance with skin, teeth, hair, and clothes (Fig. 72.2). A design challenge of android robots is to avoid the uncanny valley where the appearance and movement of the robot resemble more of an animate corpse than a living human. Designs that fall within the uncanny valley elicit a strong negative reaction from people [72.37]. In contrast to trying to look as human-like as possible, there are also more doll-like robots that are intentionally designed to have simplified facial cues and predictable movements to be suitable for therapeutic contexts [72.21].

Fig. 72.2a–d
figure 2

Some examples of androids: (a) One of the earliest face robots developed at the Science University of Tokyo (after [72.18]); (b) Geminoid developed at ATR (after [72.19]); (c) ROMAN developed at the University of Kaiserslautern (after [72.20]); (d) KASPAR developed at the University of Hertfordshire is a child like robot used during therapeutic interventions to help children with autism (after [72.21])

There are a number of more creature-like social robots that take their aesthetic and behavioral inspiration from animals (Fig. 72.3). Given that people pet and stroke companion animals, touch-based communication has been explored in several of these more animal-inspired robots. Sony’s entertainment robot dog, AIBO [72.30, 72.38], is a well-known commercial example. Other robots in this category have a more organic appearance, such as the therapeutic companion robot seal, Paro [72.31]. Researchers have chosen to design robots with a more fanciful appearance, melding anthropomorphic with animal-like qualities such as Leonardo ([72.32, 72.33, 72.39], etc.).

Fig. 72.3a–d
figure 3

Examples of social robots inspired by animals with anthropomorphic qualities: (a) AIBO, the robotic dog developed by Sony (after [72.30]), (b) Paro, the therapeutic seal robot developed at AIST (after [72.31]), (c) Mel, the conversational robotic penguin developed at MERL (after [72.32]), and (d) Leonardo developed at the MIT Media Lab (after [72.33])

Many social robots are not overtly humanoid or zoomorphic, but still capture key social attributes (Fig. 72.4). For instance, one of the best-known and pioneering social robots Kismet [72.3] developed at the MIT Artificial Intelligence Lab. Kismet had a very expressive mechanical face with anthropomorphic features like large blue eyes. Another example is the dancing robot Keepon developed by NiCT (Japan). This small yellow robot has a simplistic face and uses a classic animation technique called squash and stretch for expression of the body [72.34].

Fig. 72.4a–d
figure 4

Examples of social robots that are neither humanoid nor zoomorphic but capture key social attributes: (a): Kismet (after [72.3]); (b) Keepon (after [72.34]); (c) Pearl (after [72.35]); (d) Valerie (after [72.36])

Many mobile social robots have been fitted with faces to enhance social interaction (Fig. 72.4). Some examples are the eldercare robot, Pearl [72.35], and the robotic receptionist Valerie with a graphical face on a GlossaryTerm

LCD

(liquid crystal display) screen [72.36], both developed at Carnegie Mellon University. Other examples are commercial robots like PaPeRo developed by NEC [72.40]. Still, some social robots have no overt social features like faces or eyes, but rely purely on language-based communication. Issues of proxemics on mobile social robots have also been explored such as how a robot should approach a person [72.41], follow a person [72.42], or maintain appropriate interpersonal distance [72.43].

3 Social Robots and Social-Emotional Intelligence

Humans are fundamentally emotional beings. Consequently, human communication and social interaction often includes affective or emotive factors. To support the emotional side of human behavior, researchers are exploring affective interaction and communication between people and robots. To participate in emotion-based interaction, robots must be able to recognize and interpret affective signals from humans, they must possess their own internal models of emotions (often inspired by psychological theories), and they must be able to communicate this affective state to others. In general, emotional displays can inform the interpretations about an individual’s internal states (agreement or disagreement about a belief, valuing a particular outcome, an action tendency to fight, etc.) and therefore help to predict future actions. Emotional displays can evoke emotional responses in others (e. g., displays of distress can elicit feelings if empathy and motive another to provide social support). Given that social interactions can serve such a broad range of functions, prominent scientists have argued that emotions evolved because they provide an adaptive advantage to social species where individual relationships matter [72.44].

A growing number of socio-emotional robots have been designed to realize such functions to facilitate human–robot interaction. Some of these robots have been designed with emotional responses or emotion-inspired decision making systems in order to entertain AIBO [72.30], QRIO [72.45, 72.46], conversationally engage with WE-II [72.47], or bond with people [72.48, 72.49]. Some have investigated the social-communicative aspects of emotions in coordinating behavior and influencing others, e. g., FEELIX [72.50], Kismet [72.3, 72.51]. Others have explored the functional role of emotion-inspired processing in order to make robots more intelligent, better able to learn, and better adapted to performing tasks in complex environments [72.52]. More recently, researchers have investigated the role of affect in the context of robots that work with people and perform tasks such as search and rescue/retrieve [72.16, 72.53, 72.54]. Finally, others model emotions to make robots better able to handle human emotional states, and to motivate people toward more effective interactions in a range of application domains such as education [72.55], coaching [72.56], or therapeutic systems [72.31].

3.1 Theories of Emotion

The robot’s computational model of emotion determines the robot’s affective responses. This can depend on a myriad of interrelated physical, cognitive, and affective factors that continuously modulate and bias one another. These factors arise from the robot’s interactions with the external environment as well as its own internal state (i. e., the current emotional state, the cognitive state, goals, motives, physical states, etc.). The emotional model defines the relationship between these factors and mechanisms that result in the observable behavior of the robot. Many of these computational models of emotion are inspired by theories of human emotions. These theoretical models offer insightful constraints that help researchers to derive coherent computational models.

A number of theoretical perspectives have been particular influential in the development of computational models of emotion. Appraisal theory emphasizes a causal connection between cognition and emotion [72.57]. In appraisal theory, emotion is evoked from patterns of judgments (called appraisal variables) that characterize the personal significance of events (e. g., events and the individual’s beliefs, or desires and intentions). An active area of development is in understanding the relationship between appraisal variables and specific emotion labels, or specific behavioral (i. e., facial expressions) and cognitive responses (i. e., coping strategies) [72.58]. This kind of model lends itself to more symbolic GlossaryTerm

AI

(artificial intelligence) implementations, with if-then rules [72.59, 72.60]. In contract, dimensional theories posit that emotion and other affective phenomena are not discrete entities, but rather exist on a continuum of a continuous dimensional space [72.61]. Smith and Scott [72.62] proposed a three-dimensional (GlossaryTerm

3-D

) GlossaryTerm

PAD

model where P corresponds to pleasure (valence), A corresponds to arousal (intensity), and D corresponds to dominance (coping potential) [72.62]. Reference [72.63] mapped these appraisal dimensions to intensity varying expressions that can be computed as a weighted blend of basis postures corresponding to the main axes [72.63]. The core affect (the emotional state of the individual at any given time) is represented as a point in the 3-D space that is pushed around by eliciting events. Dimensional models are often used for generating the behavior of animated characters, as the dimensional space lends itself nicely to animation blending.

3.2 Example: A Synthesis of Emotion Theories in Action

Kismet is the first autonomous robot explicitly designed to explore socio-emotive face-to-face interactions with people [72.3, 72.51]. Research with Kismet focused on exploring the origins of social interaction and communication in people, namely that which occurs between caregiver and infant, though extensive computational modeling guided by insights from models of emotion [72.64]. Kismet was designed to be dependent on people to help it satisfy its goals and motivations.

Internally, Kismet’s models of emotion interacted intimately with its cognitive systems to make affective appraisals about its interactions with the surrounding environment and people. These appraisals characterized the robot’s interaction with its environment (e. g., Is an object too close to the robot so that it might do damage?, Is that person speaking to me in a praising tone of voice? etc.). These appraisals were tagged with somatic markers [72.65] that characterized how these appraisals mapped to the robot’s internal measures of arousal (A), valence (V), and stance (S), an action tendency to approach or avoid). The affective appraisals directly influenced the robot’s behavior selection and goal arbitration processes. The somatic markers influenced the robot’s core affective state – a continuously adjusted point within a three dimensional [A,V,S] space. The core affect mapped to the robot’s emotive expressions conveyed through vocal quality, facial expressions, and body posture [72.66]. Rather than simply triggering a set of discrete emotive expressions, Kismet’s facial and postural expressions were continuously computed following a componential approach, as a weighted blend of basis postures of the face and body along the [A,V,S] axes. This subtle variability in expressive behavior was important for enabling the robot to mutually regulate affective states with a person [72.67, 72.68]. Finally, the core affect and affective appraisals contributed to elicit adaptive behavioral responses (e. g., orient, search, avoid, etc.) as part of the robot’s emotive responses (loosely inspired by the idea of emotion circuits [72.69]). Through a process of behavioral homeostasis [72.70], these emotive responses served to influence how people interacted with the robot in order to restore the robot’s internal drives, goals, and affective states [72.67, 72.68]. illustrates Kismet’s ability to recognize, express and interact with people using emotive cues. Ultimately, the purpose of this dance was to keep the robot in a zone of proximal development conceptualized by Vygotzky to be optimal for learning to propel the robot down a developmental path [72.71].

3.3 Emotional Empathy

For humans, the dynamic coupling of like minds through the actions of similar bodies is critical for acquiring human-like intuitions about the internal states of others. Dautenhahn [72.2] is one of the earliest works to explore empathic mechanisms of understanding others in social robot–robot interaction.

Fig. 72.5a,b
figure 5

Kismet and a young woman mirroring affect. Facial expression and affective tone of voice are tightly correlated: (a) mirroring interest/arousal; (b) mirroring negative affect

It is likely that emotional empathy in humans is learned, beginning in infancy. Various experiments with human adults have shown a dual affect–body connection whereby posing one’s face into a specific emotive facial expression actually elicits the feeling associated with that emotion [72.72, 72.73]. Hence, imitating the facial expressions of others could cause an infant to feel what the other is feeling, thereby allowing the infant to learn the association of observed emotive expressions of others with the infant’s own internal affective states. Other time-locked multimodal cues may facilitate learning this mapping, such as affective speech that accompanies emotive facial expressions during social encounters between caregivers and infants. Using a similar approach, Breazeal et al. [72.74] posit that a robot could learn the affective meaning of emotive expressions signaled through another person’s facial expressions, body language, and synchronized multimodal cues such as vocal prosody [72.68, 72.75] (Fig. 72.5a,b ). These time-locked multimodal states occur because of the similarity in bodies and body-affect mappings, and they enable the robot to learn to associate its internal affective state with the corresponding observed expression. In later work, they implemented a model of social referencing by which the robot, Leonardo, that combines models of empathic association with models of shared attention (Sect. 72.4.1).

4 Socio-Cognitive Skills

Socially intelligent robots, however, must understand and interact with animate entities (i. e., people, animals, and other social robots) whose behavior is governed by having a mind and body. In other words, social robots need the ability to recognize, understand, and predict human behavior in terms of the underlying mental states such as beliefs, intents, desires, feelings, etc. Psychology calls this ability Theory of Mind (also known as mindreading, mental perception, social commonsense, folk psychology, social understanding, among others).

This section reviews research in implementing models of human socio-cognitive skills and abilities on robots. Social robots shall need a diverse repertoire of such skills to realize their full potential in daily human life – to communicate, cooperate, and learn from people in a human-centric and human compatible manner.

For instance, social robots will need to be aware of people’s goals and intentions so that they can appropriately adjust their behavior to help us as our goals and needs change. They will need to be able to flexibly draw their attention to what we currently find of interest so that their behavior can be coordinated and information can be focused about the same thing. They need to realize that perceiving a given situation from different perspectives impacts what we know and believe to be true about it. This will enable them to bring important information to our attention that is not easily accessible to us when we need it. Social robots will need to be deeply aware of our emotions, feelings, and attitudes to be able to prioritize what is the most important thing to do for us according to what pleases us or to what we find to be most urgent, relevant, or significant.

Furthermore, the behavior of social robots will need to adhere to people’s expectations. Namely, people will apply their theory of mind to understand the robot in terms of these mental states as well.

4.1 Shared Attention

Scassellati [72.76] was one of the earliest works to pose the question of how to endow robots with a theory of other minds. Inspired by the theoretical viewpoints proposed from the study of autism (believed to be a deficit of theory of mind), Scassellati implemented a hybrid model of those models proposed by Leslie [72.77] and Baron-Cohen [72.78] where shared attention is viewed to be a critical (and missing) precursor to the theory of mind competence. This hybrid model was implemented on the humanoid robot, Cog. The robot was able to exhibit an assortment of social-cognitive skills such as joint attention, distinguishing an entity in the environment as either being animate or inanimate, and imitating only entities deemed to be animate [72.79].

Several researchers have explored models of joint reference, guided by insights provided by developmental psychology and autism research [72.76, 72.79, 72.80, 72.81]. Normal human infants first demonstrate the ability to share attention with others at 9 to 12 months of age, such as following the adult’s gaze or pointing gestures to the object being referred [72.78, 72.82]. In these works, joint attention is a learned process. For instance, the robot learns the visual motor mapping from the human’s attentional cue (often using head pose as a popular indicator of what the human is currently looking at) to the motor commands necessary to have the robot look at the same thing. This process is often bootstrapped by having the human look to where the robot initiates its gaze. In Fasel et al. [72.80], the robot learns a model of joint attention because it discovers that the human’s gaze is a reliable indicator of where there is something interesting to look.

Thomaz et al. [72.83] explore attention-monitoring behavior of a robot in a social referencing interaction. In the developmental psychology literature, the ability for babies to actively monitor that others are looking at the same thing is a strong indicator of shared attention [72.84]. Social referencing is considered to be an early demonstration of shared attention because the baby looks back and forth between the novel object and the adult’s emotive reaction toward that object to learn the association between the two. To implement shared attention, the robot’s attentional state is modeled with two related but distinct foci: the current attentional focus (what is being looked at right now) and the referential focus (the current topic of shared focus, i. e., what communication, activities, etc. are about). Furthermore, the robot maintains a model for its own attentional state and a model for the attentional state of the human. The robot uses the heuristic of looking time upon a shared object to infer the referent of the interaction. Once the referent has been identified, the robot monitors the attention of the human in order to associate their emotional reaction about that object to the intended target. Video illustrates Leonardo’s ability to interact and learn from a person via social referencing.

4.2 Mental Perspective Taking

This section explores this empathetic, self-as-simulator approach further to address more general challenges in endowing robots with mental perspective-taking abilities. These approaches are inspired and informed by theories championed by neuroscience and embodied cognition called Simulation Theory.

4.2.1 Simulation Theory

Simulation theory holds that certain parts of the brain have dual use; they are used to not only generate our own behavior and mental states, but also simulate the introceptive states of the other person [72.85]. In other words, we engage in a process of perspective taking and mental simulation.

For instance, Gallese and Goldman [72.86] proposed that a class of neurons discovered in monkeys (called mirror neurons) is a possible neurological mechanism underlying both imitative abilities and Simulation Theory-type prediction of the behavior of others and their mental states. Further, Meltzoff and Decety [72.87] posit that imitation is the critical link in the story that connects the function of mirror neurons to the development of adult mind-reading skills. From the field of embodied cognition, Barsalou et al. [72.88] presents additional evidence from various social embodiment phenomena that when observing an action, people activate some part of their own representation of that action as well as other cognitive states that relate to that action.

4.2.2 Mirror Systems for Recognizing Actions

Inspired by these theories and findings, Johnson and Demiris [72.89] employ a simulation of visual perception to recreate the visual egocentric sensory space and corresponding egocentric behavioral space of the observed agent to increase the accuracy of action recognition. This approach is based on their GlossaryTerm

HAMMER

architecture (hierarchical attentive multiple models for execution and recognition) that takes a mirror neuron inspired approach to action recognition and imitation by directly involving the observer’s motor system in the action recognition process. Specifically, during observation of another’s actions, all of the observer’s inverse models (akin to motor programs) are executed in parallel via simulation using forward models, and then compared to the observed action. The one that matches best is selected as being the recognized action. Perceptual perspective taking is needed to provide meaningful data for comparison. The simulated actions used by the observer during recognition must be generated as though from the point of view of the other person. They demonstrate this approach in an experiment where a robot attributes perceptions and recognizes the actions of a second robot [72.89].

4.2.3 Mental Perspective-Taking for Inferring Beliefs and Goals

Gray et al. [72.90] have implemented computational models of simulation-theoretic mechanisms throughout several systems within Leonardo’s cognitive architecture to enable the robot to infer beliefs and goal states of a human collaborator.

The robot reuses its belief-construction systems from the visual perspective of the human to predict the beliefs the human is likely to hold to be true given what he or she can visually observe. This enables the robot to recognize and reason about the beliefs held by a person, even when they diverge from the robot’s own beliefs of the same situation.

In psychology, the ability to appreciate the divergent beliefs of another is classically demonstrated by the famous false belief task. In this task, subjects are told a story with pictorial aides that typically proceeds as follows: two children, Sally and Anne, are playing together in a room. Sally places a toy in one of two containers. Sally then leaves the room, and while she is gone, tricky Anne moves the toy into the other container. Sally returns. At this point the human subject is asked Where will Sally look for the toy?

The robot, Leonardo, has demonstrated its ability to pass these sorts of false belief tasks where it observes two humans playing the roles of Sally and Anne [72.91]. Within the robot’s goal-directed behavior system (where schemas relate preconditions and actions with desired outcomes) motor information is used along with perceptual and other contextual clues (such as hierarchically structured task knowledge) to infer the human’s goals and how he or she might be trying to achieve them (i. e., plan recognition).

4.3 Perspective Taking in Collaboration and Teamwork

By using a simulation-theoretic methodology, mental inferences made across different cognitive systems can interact in interesting and useful ways to support collaborative behavior where a robot offers its human teammate appropriate assistance.

4.3.1 Using Visual Perspective Taking to Resolve Ambiguous Referents

Trafton et al. [72.92] have developed and implemented visual and spatial perspective taking abilities based on mental simulation to support human–robot interaction and collaboration. Their cognitive architecture, Polyscheme, is designed to model how humans integrate multiple representational methods, reasoning, and planning methods to keep track of the world, including rich facilities for representing counterfactual worlds. It thus supports simulations of other people’s visual perspective to reason about interactions and the world from this alternate point of view.

They have demonstrated these skills in a number of experiments, such as demonstrating the robot’s ability to learn how to play hide and seek with a person where the robot learns what makes a good hiding place with respect to being completely occluded from the human seeker’s point of view [72.93]. They have also demonstrated the usefulness of this system for a robot that solves a series of perspective-taking problems using the same frames of references and spatial reasoning abilities that astronauts do to facilitate collaborative problem solving – such as repairing a vehicle with another person that has a different vantage point [72.94]. For instance, the robot can handle egocentric requests (i. e., hand me the cone to my right), addressee-centric requests (i. e., hand me the cone to your right), or object-centered requests (i. e., hand me the cone in front of the box).

In Trafton et al. [72.92], a human interacts with the robot using a multimodal interface that supports speech and gesture. The robot’s perspective taking skills are used to resolve ambiguous referents that can arise when a person asks a robot perform an action in relation to an object (i. e., asking the robot to hand me the wrench when there are multiple wrenches to choose from). In particular, a visual occlusion in the workspace might hide another candidate wrench from the person’s viewpoint but not from the robot’s viewpoint (Fig. 72.6). The robot can infer which is the intended object by taking the visual perspective of the human and applying principles of joint salience and least effort. If there still remains an ambiguity, the robot can act to resolve it by asking which one?

Fig. 72.6
figure 6

Robonaut using visual perspective taking to disambiguate the intended referent when asked to hand me the wrench. The human can only see one wrench, but the robot can see both. The robot correctly hands the wrench that both can see

4.3.2 Providing Informational or Instrumental Support

Gray et al. have demonstrated the ability for the Leonardo robot to successfully infer its human partner’s beliefs, desires, and intentions from real-time behavior during collaborative tasks. The shared workspace can have either visual occlusions [72.90] or can change dynamically where not all participants know of these changes [72.91]. The robot can integrate these mental state inferences to decide how to best help the person such as offering instrumental support (acting on the environment to help the human complete their goal) or provide informational support (giving relevant information the person needs to successfully achieve his or her goal).

Consider the following scenario: a helpful robot is introduced to two people, Sally and Anne. All three watch as Anne hides chips in a box to the left of the robot and cookies in a box to the right. Sally leaves the room, at which point Anne plays a trick on Sally by swapping the contents of the boxes and then locks both boxes with a combination lock. Anne leaves, and Sally soon returns craving the chips she saw placed in one of the boxes. Sally remembers seeing the chips placed in the left box and attempts to open it by working the combination lock. The robot has matching chips and cookies that it can give out. What should the robot do to assist Sally?

Mindreading skills play an important role in this plan recognition scenario where the robot must observe Sally in real time to infer Sally’s misconception of where the chips are (Anne switched the location when Sally was out of the room), to infer what her desire is based on her behavior (Sally never explicitly said she wants the chips), and to recognize that Sally’s plan for how to get the chips is actually invalid (she is trying to open the wrong box). The robot has true knowledge of the situation, and must then reason about how to best help Sally get the object of her desire.

Gray et al. [72.91] ( ) combines these three kinds of mental inferences to demonstrate intention recognition with divergent beliefs for collaborative robots. Specifically, for the case of informational support, Leonardo relates its own beliefs about the state of the shared workspace to those of the human based on the visual perspective of each. If a visual occlusion exists or an event occurs that prevents the human from knowing important information about the workspace, the robot knows to direct the human’s attention to bring that information into common ground. For instance, Leonardo points to the box that actually holds the chips. For the case of instrumental support, Leonardo helps the person by directly giving the person a matching bag of chips. Gray and Breazeal [72.95] explore the robot reasoning and taking explicit action to deceive a human competitor.

5 Human Social Responses to Social Robots

5.1 Social Judgments

Anthropomorphic design is important for social robots given that the appearance, interface, and function of a technology or product impact how people perceive it, interact with it, and engage with it over time [72.12]. Robots and other technologies with humanlike design cues elicit social responses from people [72.96, 72.97, 72.98, 72.99]. For instance, people have been found to respond more positively to artifacts that exhibit humanlike cues such as emotive expressions over those with a purely functional design, although user preferences were task and context dependent [72.100, 72.101]. Adding humanlike cues to a technological artifact can foster people’s social connection to it, aid people in learning how to use it, and enhance liking, engagement, and the desire to collaborate [72.102, 72.103, 72.104]. Others have found that people tend to hold richer mental models of as anthropomorphic robots than mechanistic ones [72.103, 72.105]. Others have explored a number of anthropomorphic design features such personality, backstory, use of humor, and even the notions of self (e. g., referring to itself as I), deception, politeness, and moral regard [72.106, 72.107, 72.108]. However, it is important that the appearance and interface of the robot’s design match its capabilities and the users’ expectations or negative effects can result [72.109].

5.2 Physical Versus Virtual Embodiment

When considering the role and advantages of social embodiment, one might ask whether there is a difference between physical and virtual counterparts. Indeed, physical social embodiment offers a number of advantages over purely graphical representations. First, while many social interactions involve exchanging only visual and auditory cues, robots support communication and collaboration through physical contact as well. Robots support the joint manipulation of artifacts and the sharing of physical space with people. Both are important for all sorts of collaborative activities such as assembly and manufacturing [72.110, 72.111] search and rescue [72.16, 72.53], domestic assistance [72.103], and more. Further touch is not only an interesting and important communication modality with noted heath and therapeutic benefits, but it can also influence the social judgments people make of robots such as how caring or persuasive it is perceived to be [72.103, 72.104]. The modality of touch will be discussed further in section social touch [72.31, 72.39].

In addition, a growing number of studies that directly compare virtual to physical agents report that people show more trust, compliance, and enjoyment with physical robots [72.112, 72.113, 72.114, 72.115]. Beyond user preference of physical over virtual agents, a number of studies have also shown improved human performance and outcomes on a wide variety of tasks ranging from games [72.112] to educational contexts [72.115], assistive tasks [72.114], health-related activities [72.116], and Wizard-of-Oz user studies [72.113].

Finally, because virtual agents are a representational form, interpretation and mapping may prove too challenging for individuals with cognitive or social deficits. Hence, robots may prove advantageous as therapeutic interventions for children with autism who can show little or no interest in forms on video monitors or televisions [72.117]. Finally, social robots tend to support face-to-face group dynamics [72.118] – while screens tend to capture eyeballs at the risk of diminishing face-to-face interaction [72.119]. This has important considerations for the design of learning technologies for young children, in particular, where supporting the participation of the parent in a social way is of great benefit to how children learn.

5.3 Social Stimulus to Learn About People

The impact of physically embodied social robots on human social responses has opened new applications for robots as an interesting tool to help scientists learn about human social behavior and judgments. People make a variety of social judgments through the dynamic exchange of nonverbal cues such as postural shifts, subtle head gestures, arm gestures, facial cues, tone of voice, etc. Indeed, ample evidence indicates that humans regularly use specific cues, often without conscious awareness, to infer the motivations of others with some level of accuracy [72.120, 72.121]. If people employ such cues without conscious awareness, then it becomes difficult to use people as experimental confederates where such cues must be carefully controlled. Thus, in one paradigm, social robots can be used as a highly controllable social stimulus in the place of a confederate in human participant studies. This is particularly useful when trying to understand how human nonverbal cues influence people’s social judgments, such as how trustworthy another is perceived to be from a brief encounter [72.122].

5.4 Social Rapport

One important skill for social robots is the ability to build and maintain social rapport with its user. Social rapport exists in the interaction between individuals. It creates a powerful interpersonal influence and responsiveness based on mutual attentiveness, positivity, and mutual responsiveness [72.123, 72.124, 72.125, 72.126]. In joint activities, the ability to establish a good rapport often results in improved outcomes. For instance, students learn better when they have a good rapport with their teachers [72.127], patients are more adherent and have better health outcomes when they have a good rapport with their doctors [72.124], teams make better decisions and work more effectively when they share a good rapport with each other [72.128]. The quality of rapport between people is influenced by the exchange of nonverbal behaviors between individuals [72.129, 72.130]. Although the specifics of how people build a strong rapport with one another are still a topic of scientific inquiry, the more in-sync and open the participants’ nonverbal cues are in relation to one another, the more positive rapport results. For instance, appropriate mirroring or synchrony of body posture, head movements, facial expression, and vocal prosody can all contribute to positive rapport. Open cues signal a receptiveness to interact, e. g., making appropriate eye contact, leaning toward another, and arm gestures that tend to not occlude the body or face.

Hence, when considering how social robots can effectively collaborate with people, their ability to build and maintain good social rapport with people (at least perceived rapport) is important. Thus, it is not just a robot’s ability to perform such nonverbal cues that ultimately matters, but the robot’s ability to coordinate them with those of people in real-time. For instance, researchers have explored the coordination of an agent’s nonverbal cues with people to improve rapport [72.56], engagement [72.32], trustworthiness [72.122], and deictic cues [72.131].

5.5 Social Support

Improved rapport also facilitates the ability of social robots to provide people with effective forms of social support. Social support is recognized to play an effective role in helping people to attain personal goals and improved outcomes in broad domains such as education, mental health, physical health, aging, coping, and more. It is conceptualized as the perception and actuality that one is cared for, has assistance available from others, and that one is part of a supportive social network [72.123]. These supportive sources can be emotional, instrumental, informational, or companionship. Emotional support, for instance, conveys to an individual know that they he or she is valued; it includes offering empathy, concern, nurturance, encouragement, and acceptance to name a few [72.124, 72.125]. Instrumental support concerns the concrete, utilitarian ways to provide assistance such as financial assistance, material goods, or services. Informational support includes offering useful advice, guidance, or information to help others problem-solve. Finally, companionship support gives someone a sense of social belonging and having another to participate in shared social activities. These forms of social support can come from people, professionals, pets, and even social robots.

The ability to provide a user with social support is one of the effective ways that social robots can help people through social means. Robots can provide this assistance through direct interaction with its user, or by helping to mediate the provision of social support from people (e. g., connecting people). A wide variety of social robots today are being developed to interact with people as tutors [72.132], learning companions [72.118, 72.133], coaches [72.116], domestic helpers for the elderly [72.134], therapeutic aids [72.21, 72.31], and more. Through dialog, nonverbal cues, expressive displays, and physical actions, these robots assist people by providing information, monitoring performance, offering feedback, incentivizing and sustaining motivation, giving encouragement, offering companionship, performing physical tasks, etc.

As such, social robots have broad applicability in many domains where it is a technology that can extend and augment the social support provided by people. This is particularly relevant for societal challenges, e. g., eldercare, health and chronic disease management, and education, where social support is recognized as being critical for positive outcomes, but where there is a recognized shortage of trained professionals to meet the demand. Further, whereas frequent meetings with human professionals is cost prohibitive, social robots have the potential to be used to fill the gaps in a cost-effective way. Importantly, social robots are not being designed to replace or obviate human professionals, but rather to serve an effective tool that supports human networks in a scalable and cost-effective way.

6 Social Robots and Communication Skills

Communication implies an exchange of information through natural language. Thus, one might consider that very good GlossaryTerm

ASR

(automatic spoken-language recognition) is the primary function required to fulfill the communication skills for robots. However, social robots are expected to engage in casual communication with people in as natural a way as people communicate themselves. In such casual communication, information is often exchanged nonverbally as well as verbally. Thus, robots need to be well equipped to recognize people’s nonverbal cues and to express nonverbal cues via their nonverbal behavior. The required computation even takes into account good perceptual and cognitive capability of surrounding environments in addition to the targeted person. This section introduces the history of research in social robots and communication skills, and provides insights for future challenges.

6.1 Verbal/Nonverbal Communication

Historically, even first-generation humanoid robots developed in the 1970s had primitive capabilities for communicating using natural language. For instance, WABOT and WABOT-2 had a conversation capability in natural language, which is based on the model as simple combinations of speech input/output mappings [72.135, 72.136].

Furthermore, early pioneers had noticed the importance of nonverbal information in human conversation. Nonverbal behavior was classified into three roles:

  1. 1.

    Regulators: expressions such as gaze, poses, and vocalizations that are used to regulate/control conversational turn-taking.

  2. 2.

    State displays: indication of internal state including affect, cognitive, or conversational states that improve interface transparency.

  3. 3.

    Illustrators: gestures that supplement information for the utterance. These include deictic gestures (pointing) and iconic gestures.

In this scope, nonverbal information is considered as supplemental. It is natural language that communicates the primary information in turn-based exchanges. We provide examples below.

6.1.1 Regulatory Cues

Even some of the earliest social robots displayed nonverbal information to regulate interactions with people. Hadaly 2 was the first robot to use mutual gaze as a nonverbal cue to regulate conversation [72.137, 72.138]. The mutual gaze is approximated using face recognition to determine when the human’s face was facing the robot; when a mutual gaze occurred, Hadaly 2 expressed readiness to commence conversation by blinking its eyes. People’s gaze toward a robot is also considered cues to inform whether he/she is engaging in the conversation with the robot [72.139].

Other examples are Kismet [72.140, 72.141, 72.3] and Leonardo [72.142, 72.33], which had the capability for nonverbal cues called envelope displays to regulate the exchange of speaking turns. Backchannelling cues were found to reduce stress and cognitive load during complex human–robot teaming task (Fig. 72.7). See for how Kismet uses envelope displays to regulate speaking turns with people. The regulatory role for a speaking turn is demonstrated in multi-party interaction too. People tend to make eye contact and raise their eyebrows when they are ready to relinquish their speaking turn and tend to break their gaze and blink when starting their speaking turn. Recognition of these cues was implemented for smoothing and synchronizing the exchange of speaking turns. Gaze is also known to convey the speaker’s intention who may take the next turn. Mutlu et al. have successfully replicated it in human–robot interaction [72.143, 72.144]. Kirchner and Alempijevic revealed that gaze is also effective in communicating who should receive an item provided by a robot [72.145]. In opposite way, gaze was also used as a cue to interpret whether one would wish to take turn [72.146].

Fig. 72.7
figure 7

Backchannelling cues were found to reduce stress and cognitive load during complex human–robot teaming task. Teams where the robots engaged in backchannel cues to human requests also tended to find more items in a search and retrieve task (after [72.16]),

Paralinguistic information is also processed. People frequently provide short acknowledgment utterances (e. g., uh-huh, um-hmm, huh, etc.) as the robot explains something. These responses are either acknowledgments, or ask-backs. It is very difficult to distinguish these two kinds of utterances from only the linguistic information as represented by the transcription of the utterance. Fujie et al. demonstrated a method to distinguish the utterance as either an acknowledgment or an ask-back from the prosody of utterance [72.47].

6.1.2 State-Display Cues

The facial expression and gaze were used to indicate the conversational or cognitive state of a robot. Such state-display cues make the robot’s internal state more transparent to a user and thus enables him/her to better understand the robot’s state. For instance, ROBITA used the tightness of its facial expression to indicate readiness to engage in conversation; a tight face was used to express conversational readiness, while a loose face communicated a lack of readiness to engage [72.147]. Emotional state and attention target can also be displayed nonverbally (Sects. 72.3 and 72.4).

State of listening and level of understanding are also displayed nonverbally. Human listeners use back-channel responses, such as head nods to convey the fact that he/she is successfully following the conversation. Imitating human behavior, robots indicate their state of listening using head nods [72.148], facial expressions [72.5], and bodily motion ([72.149] and ). Such state-display cues are used in a human-like telepresence robots to indicate operators’ state of being engaging to the conversation [72.150, 72.151].

Another back-channel signal is an expression of confusion by the listener (verbal or nonverbal). This flags the speaker to stop and try to repair the broken communication. Robots such as Leonardo and ROBITA use facial displays of confusion when speech recognition fails in order to intuitively communicate to the human that he or she should repeat their last utterance.

There were robots that process humans’ back channel feedback [72.32] A sophisticated head nod recognition system was developed whereby the robot, Mel, could successfully distinguish small feedback nods from other kinds of head nods such as those that communicate agreement. Mel used this information to determine its own nodding behavior in order to be an appropriate response for the human. In a series of human subject studies, Sidner et al. found these nonverbal cues to enhance the social engagement of the robot to people [72.32].

6.1.3 Illustrator Cues

Deictic gestures have often been implemented in robots for pointing to an object, such as using index-finger pointing [72.152, 72.153], gaze [72.143], and the combination of the two [72.154, 72.155, 72.156, 72.157, 72.74]. Other types of gestures, such as iconic gestures [72.156] and region pointing ([72.158] and ), were also successfully used in robots. The effect of gestures has been successfully demonstrated. For instance, in a direction giving scenario, even though turn-by-turn direction is verbally given thus could be comprehend without gesture, supplemental pointing gesture improved listeners’ comprehension about the given direction [72.159].

A number of robots are able to recognize deictic gestures of a person conveyed either through pointing gestures or head poses. For example, Leonardo is able to infer the object referent in an interaction by considering a number of factors including pointing gesture, head pose, and speech. Sugiyama et al. associated verbal spatial deixis and pointing gestures to better recognize the pointed target ([72.160] and ). Brooks and Breazeal [72.153] developed a deictic recognition system that enabled a robot to infer the correct object referent from correlated speech and deictic gesture. Interestingly, it was found that the accuracy of the human’s pointing gesture is surprisingly poor. As a result, the deictic recognition system relies on coordinated speech and gesture information, with spatial knowledge provided by a three-dimensional (3-D) spatial database constructed by the robot using real-time vision, and a deictic spatial reasoning system. This system was successfully demonstrated on the dexterous humanoid Robonaut developed at National Aeronautics and Space Administration (GlossaryTerm

NASA

) Johnson Space Center (GlossaryTerm

JSC

) (Fig. 72.8b) where the human points to and labels a set of four bolts on a wheel to be fastened in order by the robot.

Fig. 72.8a–c
figure 8

Examples of conversational robots: (a) ROBITA performing group conversation; (b) Robonaut interpreting the pointing gestures of a human to determine which nut to fasten on the wheel; (c) Leonardo uses gaze and joint attention to ground the human’s pointing gesture for the desired referent

6.2 Mechanisms for Human–Robot Communication

Mechanisms of turn-based communication are well studied. In linguistics, it is considered that conversation is formed as the repetition of turn-taking [72.161]. There is one speaker who takes floor and speaks while listeners listen to him/her [72.162]. When the speaker finishes speaking, then the floor is taken by one of the listeners. There could be bystanders who also listen to, but do not intend to take turn. This turn-based model is the typical model used in dialog modeling [72.163]. That is, a dialog management system identifies who owns floor, recognizes words in utterances the speaker spoke, and generate utterance when the system takes floor, often implemented with series of rules. It is successfully extended for human–robot communication. For instance, Nakano et al. developed rule-based architecture in which the planner deals with robot’s task-based actions as well as dialog management [72.164]. Scheutz et al. developed software architecture, named DIARC. It uses a rule-based planner that receives inputs from all perception modules, and addresses effect and goal-directed actions in addition to natural language dialog management [72.165]. In such an approach, the robot communicates through utterances accompanied with nonverbal cues [72.166]. Some systems deal with multi-party dialog, in which robot’s gaze-cue (Fig. 72.8a) is used to regulate who is the addressee, the active listener who is expected to take the next turn [72.143, 72.167, 72.168, 72.169, 72.170, 72.171].

Researchers have also been well aware of the importance of the time-sensitive nature of communication. For instance, interruption in the middle of a speaker’s turn has also recently been taken into account in dialog management systems [72.164, 72.172]. Chao and Thomaz, proposed more elaborated model, in which time-synchrony in verbal and nonverbal cues, e. g., gaze and gesture, are addressed using a Petri-net-based representation [72.173]. Empirical studies have revealed what is good synchrony and timing. For instance, Yamamoto and Watanabe, have studied synchrony within a robot’s utterance and motion, and revealed that people prefer the robot whose utterance is slightly delayed from the start of its motion [72.174]. This would mirror what humans do everyday, as it is reported that humans’ gestures are performed slightly ahead than their utterances [72.175]. Shiwa et al. revealed that people prefer small delay in the robot’s response, and conversational filler such as etto would be useful to buy time when the robot’s response is delayed ([72.176] and ).

However, while above studies are under the assumption that information is communicated through turn-based dialog, recent studies have revealed more dynamic cases of human–robot communication, in which the way robots communicate information is sometimes out from the turn-based dialog paradigm. For instance, during the moment a robot and a person are going to initiate interaction both of them communicate their intention that they would like to meet and talk. A couple of studies investigated a way for a robot to express its intention to welcome the initiation of interaction [72.177]. When the target user is seated, Dautenhahn et al. revealed that they prefer the robot approach from side ([72.178] and ). Figure 72.9 shows a scene in which the robot approached to pedestrians. Satake et al. revealed that such interaction failed if the robot failed to communicate its intention to talk. As the robot was operated in noisy shopping mall, when it used only verbal utterance, the robot was simply ignored. Instead, they found that the robot needs to communicate its intention nonverbally. It needs to approach from a frontal direction and needs to be responsive in adjusting its body orientation toward the targeted person; such nonverbal behavior made the robot more success in initiating conversation with pedestrians [72.179].

Fig. 72.9
figure 9

A robot that approaches pedestrians

Pedestrians also communicate their intention nonverbally; for instance, when he/she does not wish to talk to the robot, they avoid approaching to the robot. Thus, a robot could use the proximity information to estimate people’s willingness in initiating interaction. Michalowski et al. classified spatial zones around a robot, and let the robot talk to the person if he/she approached to the robot and entered to social distance [72.177]. There are some empirical studies about proxemics that revealed that people often prefer to stay at social distance when they talk to a robot [72.180, 72.181, 72.182].

Joint attention (as introduced in earlier Sect. 72.4) can be silent. Figure 72.10 and show a scene where a person share his attention target nonverbally with the robot. The person is curious about the computer and stands in front of it; then, the robot moves to the location convenient to explain the object. When the person moves to the other exhibit, the robot follows him and explains the one in front of him. Here, it is their standing position that communicates the target of attention [72.183, 72.184].

Fig. 72.10
figure 10

A person and robot implicitly share the target of attention via spatial formation

These examples show the cases where communication is not necessarily turn-based. This is because information can be exchanged nonverbally in such casual communication. Unlike the speech channel that needs to be typically occupied by only one speaker (speaking person), nonverbal signals can be mutually exchanged at the same time. For instance, when a pedestrian and a robot meet, their positions continuously change while they walk which communicates whether they would like to initiate conversation. As social robots aim to operate in people’s daily environment, such a casual and nonverbal exchange of information is not a trivial part of human–robot communication. The research for fully unveiling required mechanism for such continuous exchange of social signals is still premature.

On the other hand, some studies started to highlight the needs of a mechanism that connects communication and background knowledge. It is revealed that the model of common ground makes daily communication effective [72.185]. For example, a direction-giving scenario would be a case where environmental knowledge is useful. For instance, when a robot provides directions (Fig. 72.11 and ), it could be more comprehensive if the robot is aware of a visible landmark so that it could say please turn at the book store instead of saying please turn at the third corner [72.186]. Thus, a good model of environment, e. g., [72.187], would greatly improve a robot’s capability in communicating about a route. Moreover, if a robot has a model of users’ memory of location, it could provide destination-based direction, such as a café is nearby the book store you just visited [72.188], which is much easier to comprehend than complex turn-by-turn destination like to go to a café, please turn right at first corner, turn left at third corner, and …

Fig. 72.11
figure 11

A robot that provides directions (after [72.189])

6.3 Challenges

To summarize, turn-based conversation has been studied to a certain degree. However, social robots often engage in casual communication in which robots would need to deal with social signals and background knowledge. Here, we denote a couple of key challenges.

6.3.1 Revealing Repertory of Communication Skills

Previous studies started to reveal different forms of communication where various perceptual and cognitive capabilities are required. Early studies have revealed the importance of social signal processing, like recognizing and expressing gaze and facial expressions. Studies about initiation of conversation have revealed the mutual real-time exchange of information via their positions. Studies about giving direction demonstrated further needs of associating language and environmental knowledge, e. g., a cognitive map, how people perceive environments, and remembering landmarks.

To what extent do we need to cover repertories? We believe that there could be many elements we will find. Many of them might be interrelated with perception and cognition. Such a case would improve our understanding about what communication skills truly are. Thus, one of the important challenges is here.

6.3.2 Architecture for Communication Skills

Along with finding out the repertory of communication skills, the other important challenge is to deal with the integration, i. e., the architecture of communication skill. Moving from a turn-based structure to a dynamic structure is the real challenge.

7 Long-Term Interaction with Robot Companions

A number of social robots have been introduced as companion robots. Companion robots may have the sole purpose of providing companionship, e. g., in studies using toy robots such as the Pleo [72.190], but often they combine two aspects: being useful, i. e., being able to carry out certain tasks for the user, and carrying those tasks out in a manner that is socially acceptable [72.191]. The latter notion has been used, e. g., in several European projects [72.192, 72.193, 72.194, 72.195]. The notion of a companion often entails repeated and long-term interactions. This poses particular challenges not only for the design of the robot, the interaction design, the choices of tasks/settings/scenarios, but also on the how to ensure satisfactory and successful interaction with the robot.

7.1 Robot Companions

Recently more and more researchers are moving toward studying application areas that develop such companion robots, e. g., the use of robots in assistive and rehabilitation robotics. The use of robots to assist elderly users in their homes, with a view to extend the period they can stay and live independently in their own homes, is currently being studied extensively by different research groups worldwide. Several companies are also marketing their robots as such assistive companions, e. g., the Wakamaru robot (Mitsubishi Heavy Industries Ltd) that was introduced in 2005, or the Human Support Robot (Toyota) revealed in 2012. In addition, numerous research prototypes exist, e. g., Cody (Georgia Tech), Herb (GlossaryTerm

CMU

), Care-O-bot 3 (Fraunhofer), Hector (CompanionAble project).

A number of tele-presence robots have been developed, but this chapter will focus on autonomous or partially tele-operated companion robots. Note, Cody targets the domain of patient hygiene care that is related to issues discussed in the section on tactile human–robot interaction. Regardless of whether human–robot interaction with companions include tactile interaction, or is a hands-off approach (Chapt. 73), developing such system involves a careful study of the roles and functions of such robots in the application domain, and of users’ perception of and attitudes toward such a system.

Figure 72.12 and show the Care-o-bot robot (Fraunhofer) used in the ACCOMPANY project on home assistance for elderly people. The robot is shown in the University of Hertfordshire’s Robot House, a domestic setting for experiments into robot home assistance where participants regularly visit for human–robot interaction sessions. Such living lab settings are increasingly being used to design, experiment and evaluate innovative systems (e. g., the European Network of Living Labs, [72.196]). Providing such environments facilitates progress toward complex robotic systems that can be deployed in real-life environments.

Fig. 72.12
figure 12

Care-O-bot 3 robot (courtesy of Fraunhofer)

7.2 Engagement and Long-Term Relationships

The concept of a robot companion entails repeated, long-term interactions, which may also afford the development of relationships between robots and people. Relationships with robots can take many shapes and forms, and the formation of relationships will be influenced by many factors. For example, the specific role of the robot (Butler? Friend? Tutor? Assistant? Tool? etc., [72.197]) will matter, as well as the robots’ specific embodiment and behavioral and expressive repertoire that is often used to present them as relational artifacts [72.198, 72.199] that are designed to build and maintain social-emotional relationships with users.

Many applications with social robots do only involve short-term interactions, e. g., robots meant to function as a museum guide [72.200] and , a receptionist, etc. Here, the novelty effect can often be exploited, i. e., the general interest that many people have in new robots. However, after repeated interactions this effect can wear off and people can lose interest in a robot. Thus, a main research challenge is how to keep people engaged in the interaction and keep them motivated to interact with the robot. Establishing a useful and enjoyable relationship with a robot is not an easy task. And what exactly is long-term? Tanaka et al. [72.22] mentions at 10-hour barrier (total interaction time), Sung et al. [72.201] pose the goal of long-term interaction over more than 3 months as a main challenge. Hüttenrauch and Severinson-Eklundh [72.202] performed a user study with the service robot CERO meant to assist one user over three months in an office environment. Kanda et al. [72.189] and investigated a partially tele-operated robot over a 25-day period. The nature of these interactions depends on whether studies take place in a school, a home, a public place, etc. It will also depend on the frequency and duration of the interaction, e. g., how many interaction episodes per time interval does an individual have with the robot?

There is also the issue of quantity and quality of interaction that may be realized very differently in different application areas of social robots. More and more studies are bringing robots into the wild [72.203]. shows how preschool age children interacting with a storytelling learning companion robot during repeated encounters over a 2 month period. Personalization of the robot stories to the children led to improved vocabulary learning. Field studies are often said to be preferable, and more ecologically valid than data collected in the laboratory. However, practical, technical, and methodological constraints may limit the length and nature of the field studies. Also, bringing a robot in the wild does not necessarily make the interactions with people more natural, and the more messy conditions pose hard methodological problems and can often interfere with controlled data collection and statistical data analysis [72.204]. A field study by Heylen et al. [72.204] placed a simple dialogue Nabaztag (rabbit) robot in a few people’s homes over 10 days, and it illuminates a number of those real-life problems with field studies which impacts on the validity of the results that can be gained and on the user experience. The latter will then ultimately decide on whether people will consider a long-term companion robot amusing or a nuisance [72.204]. A taxonomy to characterize studies in the wild for child-robot interactions is provided in Salter et al. [72.205]. Systematic analyses of different experimental conditions and tasks, settings, etc., whether in the laboratory or in field studies, will help the planning, design and comparison of different long-term studies.

Note, often the expectations of users are not met by the robot’s design and abilities, and nor does it fit into daily activities of users [72.190]. The advancement of robot technology and knowledge of HRI will enable more and more field studies, in schools [72.206], nurseries [72.22], private homes [72.190], care homes [72.31], etc.

In many of the above long-term studies, commercially available robots have been used. The reasons for this are not only availability, but also robustness and reliability, as well as safety. The use research prototypes in field studies should always entail securing Institutional Review Board (GlossaryTerm

IRB

) approval to meet ethical guidelines when human participants are involved. Even prototypes that are generally safe to use, may still have cables sticking out, etc. Hence often it is a necessity to have researchers present at all times during these studies. Research prototypes are also by their very nature more prone to break down, and again this requires the constant attention of researchers. The design of research prototypes is also often not ideally suited for a field study, e. g., wheels when being used on cluttered surfaces, etc. Interesting insights on those practical but nevertheless very important points can be found, e. g., in Hüttenrauch et al. [72.207] where a modified Peoplebot (Mobile Robotics) robot was brought into eight different homes for approximately 1 hour each in order to study spatial management in HRI situations. Note, even when leaving a robot alone with a person or family in their own home, people may still be acutely aware of the experimental nature of the interaction [72.204].

Establishing and maintaining relationships between users and robots in long-term interactions needs careful consideration, and many insights can also be gained from related long-term studies with virtual agents, e. g., as exercise advisors [72.208, 72.209].

7.3 Robot Home Companions for Supporting Elderly Users

Several HRI studies in the home focus on the use of a companion robot for the general population. Koay et al. [72.210] report on a long-term, 5-week study in a domestic environment with a partially remote-controlled robot involving several different scenarios (e. g., a collaborative task, sharing physical space, recording and revealing personal information, interrupting a person to serve them, and seeking assistance from a person using combinations of physical and verbal cues). The results highlight concerns and expectations that people have in such scenarios that are relevant for the domestic use of robots. Note, the robot used in this study was partially remotely controlled, which allowed the investigation of complex scenarios.

Due to demographic changes, a lot of interest exists worldwide into developing robot technology for the care of elderly people that would allow elderly people to remain in their own homes for longer, or to provide assistance in sheldered or otherwise specifically designed environments. A robot may assist an elderly person in the home in different ways, e. g., provide physical assistance (standing up, walking, fetching and carry objects, etc.), social assistance (e. g., engaging the user in interactions with other people) and cognitive assistance (e. g., reminder functions).

In HRI, there is prior work investigating robot’s ability to remind users of future or prior activities. Autominder using the Nursebot represents an initial attempt to make reminders more intelligent and dynamic [72.211]. Intelligence and dynamics were also key aspects of the system proposed by the Robocare project [72.212]. More recently, the KSERA project [72.213] focused on the ability of the robot to draw on information not readily available to the user as well as its ability to persuade in order to safeguard the health of the user. Other examples for assistance and reminders for elderly people with dementia through multimodal interactions and smart-house integration include the EU FP7 CompanionAble project and the MOBISERV project [72.214]. The project, Florence [72.215], introduces a commercially available robot as an autonomous lifestyle device for ambient assisted living; it provided multiple services to users including an agenda reminder application that allowed the elderly to share information with caregivers, etc. The European projects SRS [72.216] and ACCOMPANY [72.217] both use the care-o-bot 3 (Fraunhofer) (Fig. 72.12) as their target robotic platform. While SRS involves the use of the robot in a remote-control scenario, the ACCOMPANY project develops fully autonomous behaviors for an empathic and assistive home companion.

Several projects combine a robotic companion with an intelligent smart/ambient environment as part a variety of ambient assisted living (GlossaryTerm

AAL

) solutions to assist with older adults, or people with disabilities, to live independently by providing support in activities of daily living (GlossaryTerm

ADL

s). Such topics are currently being studied world-wide. See, e. g., the quality of life technology (GlossaryTerm

QOLT

) [72.218] Centre of Carnegie Mellon University or the Centre for Affective Solutions for Ambient Living Awareness (GlossaryTerm

CASALA

) [72.219].

7.4 Example: Long-Term Interactions with a Robotic Weight Loss Coach

A number of recent studies investigate the use of robotic companion as health coaches or advisors. Kidd and Breazeal developed Autom [72.116, 72.220] ( ), a robot specifically designed for long-term interaction with people in the role of a robotic weight loss coach. Autom has a clear function and role, but also needs to interact with users in a socially acceptable and comfortable manner, so that users intuitively understand the robot, are willing to engage with the robot, and listen to it. This robot is also an example how academic research in long-term HRI can lead to new innovations and commercialization. The main purpose of the robot is to help the user to lose weight, trying to persuade the user to change his or her behavior [72.221].

A key ingredient supporting this functionality is the creation of a relationship between the user and the robot. Autom is thus an example of a robot that presents itself as a relational artifact and encourages people to develop social-emotional relationships with them [72.198, 72.199]. Autom uses a psychologically inspired relationship model, with the robot’s role as a caregiver. It engages the use in a dialog modeled after patient-care professional dialog to provide social support (Sects. 72.5.4 and 72.5.5) in order to build a working alliance as well as to be helpful, persuasive, positive, and supportive. For a health care coaching robot, where behavior change is a central objective, such a relationships is very important. People who need to lose weight typically do not lack the intelligence or knowledge necessary to understand that weight loss would improve their health and general well-being. The motivational factor is often a crucial point, so a weight loss robot’s role on providing social support is different from that of a kiosk robot only needs to provide information.

A long-term study with 45 participants an Autom prototype robot over 6 weeks was the first study designed to create behavior change in people with a weight management goal (Fig. 72.13). The results were very encouraging: participants developed a close relationship with the robot, and they tracked their calorie consumption and exercise much longer when using the robot, compared to other methods (a computer running the same dialog or a paper log for manual entry). Since these factors are indicators of longer term weight loss success, the study provided evidence for the effectiveness of sociable robots for long-term HRI [72.116].

Fig. 72.13a,b
figure 13

Autom, a weight management and exercise coach. (a) The Autom prototype robot developed at MIT; (b) the commercial version of the Autom robot

7.5 Challenges

Many of the projects studying the use of robot companions (e. g., supporting elderly users, providing assistance in therapy, or helping in office environments) are still at an initial stage. Future results from extensive, long-term, multi-site and even cross-cultural evaluation studies are still needed in order to illuminate the usability, usefulness, and acceptability of such systems. It is particularly advantageous if the same research platform is used in different projects so that direct comparisons and the sharing of research developments are possible.

The challenges of supporting engagement in long-term studies are beginning to emerge. New sensors and interfaces, such as brain–computer interfaces, could provide richer information on people’s emotional, attentional engagement states when interacting with a robot. Robots are providing richer social cues to explore the whole spectrum of human–human interaction modalities, e. g., using gaze, gesture, proxemics, dialogue, contingency, etc. [72.177, 72.222, 72.223, 72.224, 72.225, 72.226]. Measuring engagement in real time and allowing a robot to respond to it is a key challenge. In a user study with 37 participants and a robot that could perceive user engagement Sidner et al. [72.32] found that engagement gestures were perceived more positively than a robot without such gestures. Rich et al. [72.139] proposed an initial computational model for a robot to model engagement, based on the recognition of different events involving gesture and speech.

A clear framework of what engagement means in HRI in general, and for long-term HRI interactions in particular, is needed and will needs to be validated in extensive testing. Several other proof of concept studies of robots that adapt to user engagement and interaction show encouraging results. François et al. [72.227] evaluated a robot that adapts to user interaction in a therapeutic context based on tactile information (how children touched the robot), while Szafir and Mutlu [72.228] demonstrated how a robot can adapt to the engagement of the user, utilizing techniques from brain–computer interfaces in order to assess engagement. Future applications of this research into adaptive robots target educational settings where a teacher robot can automatically adapt to the pupils’ engagement, or therapeutic/rehabilitation applications where the robot can automatically adapt to the patient’s needs and preferences in light of therapeutic objectives.

Future research needs to illuminate how best to design interaction and embodiment of social robots that are successful in long-term interactions with people. Since adding social interaction skills to a robot is costly, the identification of a set of robot characteristics (appearance, behavior, cognition) that are necessary and sufficient to create meaningful, acceptable, and efficient long-term interaction with a robot would be beneficial. Here, it is important to provide frameworks that connect to empirical methodologies, cf. the discussion of a conceptual and methodology framework for robot believability in Rose et al. [72.229]. Note, while predominantly humanoid or zoomorphic robots are currently being used in long-term HRI experiments, even mechanically looking robots such as the Roomba robot may invite people to develop a social-emotional relationship with its users [72.201, 72.230].

Using robots in long-term repeated interactions with people also involves a number of ethical issues, as it has already been highlighted also for embodied conversational agents and virtual characters [72.208]. While in some applications the robot may have a useful function if portrayed as a care-receiver [72.133], in most applications the companion robot should provide care to its users [72.191]. As pointed out by Turkle et al. [72.198, 72.199], robots that present themselves as relational artifacts may influence people’s understanding and expectations of the nature of social relationships and friendship. Ethical issues are particularly important when users are vulnerable people such as adults with special needs, elderly people, or children. Many researchers have commented on these issues [72.231, 72.232].

Practical issues can also become crucial to the acceptance of a robot for long-term interaction, as it has been highlighted in a long-term study with Pleo – a commercially available robot, but which requires maintenance that does not fit easily into people’s routines [72.190]. Fernaeus et al. also bring up the important issue of interaction design that needs to support long-term interaction, an issue that was also brought up in user feedback from the participants in their long-term study. Novel approaches to how to design human–robot interaction may be required to facilitate long-term use of companion robots [72.190], and in real-world settings attention must be paid to how the robot technology is being introduced to people and how people and robots must adapt to each other [72.230].

Recently, the blending of virtual and robotic characters area a growing area of research whereby seamless transitions from a virtual to and from a robotic character [72.233], or migration between different robot and other digital embodiments [72.234] pose interesting challenges for future long-term companions. Indeed, it may change the nature of what we usually perceive as a companion robot [72.148, 72.233, 72.235, 72.236, 72.237, 72.238, 72.239, 72.240, 72.241, 72.242].

8 Tactile Interaction with Social Robots

In robotics and artificial intelligence research, tactile interaction has long been exploited primarily as a necessity to enable, e. g., collision detection, grasping and object manipulation, particularly in combination with vision and other sensor modalities, and led to adding touch sensors to a robot’s gripper or hand [72.243].

8.1 Touch for Social Interaction

However, the importance of human–robot tactile interaction has been highlighted recently in a number of research projects, inspired by evidence from human–human interaction and child development. The recent interest in the sense of touch goes beyond the necessity of touch for interactions with the physical environment, but focuses on the important role of tactile interaction in interactions with the social environment. For instance, Siegel et al. [72.244] found that social touch, such as shaking hands, can impact how persuasive a robot is perceived to be [72.244]. In other work, the same kind of touch can have a different impact on people’s response to the robot depending on the context [72.245].

Indeed, humans are born as tactile creatures. Physical touch is one of the most basic forms of human communication. In human development, touch plays a crucial role in developing cognitive, social, and emotional skills, as well as establishing and maintaining attachment and social relationships. Deprivation of touch in early child development can have devastating effects on a child’s development [72.246]. A comprehensive survey on communicative functions of touch in humans and other animals can be found in Hertenstein [72.247].

8.2 Touch Sensors and Mechanisms Used for HRI

Recently, more and more social robots are being equipped with tactile skin, thus allowing the robot to react according to the person touching the robot. Recent trends, e. g., in the European project Roboskin [72.248] tend toward covering the whole, or most of the robot’s body, see e. g., Schmitz et al. [72.249] for an example using modular capacitive sensors to cover the humanoid robot iCub. Related work by Dahiya et al. [72.250] surveys a variety of different technological approaches toward tactile sensors and mechanisms of tactile sensor for robots. They show how one may take inspiration from biological tactile sensors in humans and derive design hints for robotic tactile sensing. Stiehl et al. [72.251] designed a tactile system inspired by the human skin as well as somatic processing [72.251], and was later developed to recognize social and affective communicative intent of how a human touches the robot [72.252, 72.253]. The new compliant skin technology developed in the Roboskin project has two primary functions: (a) to allow a robot to operate safely and efficiently, and (b) to use tactile sensors for communication, interaction, and cooperation with people.

The field of tactile human–robot interaction is indeed a growing area of research, and a recent survey discusses tactile HRI from the perspectives of the types of interactions that may occur between a robot and a human, and the types of sensors that allow to detect these interactions in various robotic systems such as the Robovie series of robots, RI-MAN, the Huggable robot [72.253], or Paro [72.254]. Note, equipping a robot with tactile sensors may add its functionality, but in some cases the sense of touch is crucial for the key functionality of the robot (e. g., in the case of Paro [72.251] with a therapeutic/care function, or in the case of the Huggable to improve social relationships [72.255]). Many toy robots built primarily for human–robot interaction have touch sensors for petting and stroking, e. g., the AIBO or more recent Pleo robots. Indeed, the use of tactile HRI to support human–human communication over the distance is a promising area of research [72.255, 72.256, 72.257, 72.258]. Tactile feedback can also improve teaching a robot by demonstration [72.259].

In order to realize robots’ capability of a sense of touch, there are difficult sensor processing and perception problems to be addressed. One problem is that we need to identify the geometrical relationship between a number of sensors embedded in a whole body of a robot and body parts being touched when sensors are activated. For better perception of various touching, one can embed sensors into or under a soft skin. For instance, Robovie II-F has 274 sensor elements (Piezofilms) embedded in soft silicone rubber (Fig. 72.14). Tactile action activates multiple sensors at the same time, thus nearby sensors are all activated when touch action occurs to one place of body. For this problem, Noda et al. [72.260] took a bottom-up approach using observed signal patterns. Their method works in a self-organizing manner to identify mapping between signal patterns and touched location.

Fig. 72.14
figure 14

An example of layouts of piezofilm sensors embedded in a soft skin (left)

The second problem is the semantic relationship between signal pattern and people’s communicative intention of touch action. For instance, Tajika et al. [72.261], applied clustering method for signal pattern to retrieve hierarchical structure of haptic actions. For instance, tickling-chest (Fig. 72.15) is found to be similar to stroking-chest, grouped together as lightly touching-chest action. Knight et al. [72.262] developed an algorithm based on identification of typical touch types, such as tickle, pet, and poke. Yohanan and MacLean [72.263] categorized humans’ intent to communicate affect state and mapped touch action to each category, for example, comforting intent is mapped with actions like stroke and pat. Stiehl and Breazeal [72.264] trained a neural net to classify touch according to affect and communicative intent by recognizing types of socio-affective touch (e. g., pleasant or unpleasant ways of touching or teasing based on the way a person tickles, touches, pats, slaps, rubs, squeezes, etc.)

Fig. 72.15
figure 15

A child tickling the chest of a robot

François et al. [72.227] describe an algorithm for pattern recognition in HRI, the cascaded information bottleneck method and apply it to real-time autonomous recognition of human–robot tactile interaction styles. This method uses an information theoretic approach and enables to progressively extract relevant information from time series. An evaluation with real interaction data obtained with a Sony AIBO robot shows that the algorithm is capable of classifying interaction styles (frequency and gentleness of the interaction), with a good accuracy and a very acceptable delay. The cascaded information bottleneck method was later successfully applied to create a socially adaptive robot that can recognize and adapt to children’s play styles in real time [72.265]. The robot rewards well-balanced interaction styles and encourages children to engage in the interaction. The potential impact of such an adaptive robot in robot-assisted play for children with autism is evaluated through a study conducted with seven children with autism in a school. A statistical analysis of the results shows the positive impact of such an adaptive robot on the children’s play styles and on their engagement in the interaction with the robot.

8.3 Example: Teaching Children with Autism About Tactile Interaction

Using robots for the therapy and education for children with autism was first proposed by Kerstin Dautenhahn [72.266, 72.267] and it has recently attracted a lot of attention in the research community. A number of such approaches focus on robot-assisted play, since play has a crucial role in a child’s development. During play, children can learn about themselves and their environments as well as develop cognitive, social, and perceptual skills. Autism, or better autistic spectrum disorders (ASD), is a life-long development disorder with key impairments in communication and social interaction and imagination (DSM IV, 1995) [72.268, 72.269]. Robots allow for a simplified, predictable, and reliable environment, where the complexity of interaction can be controlled and gradually increased [72.270]. Children can play dyadic games with robots, or triadic games involving other children or adults. In the latter case, scenarios emphasize the role of the robot as a social mediat or [72.271].

Robot mediated playing and learning activities, if successful, have the potential to enable children with cognitive disabilities to learn and acquire basic social skills thus getting support to develop/enhance their individual potential especially in the areas of communication and interaction [72.270, 72.272]. A number of different social robots have been studied in this domain, including zoomporphic robots, humanoid robots, or mechanically looking robots, e. g., Labo-1 [72.273], NAO [72.274], Probo [72.275], Robota [72.276, 72.277], Keepon [72.278], Aibo [72.279], Tito [72.280], KASPAR [72.21], and others. While we find encouraging results from case-study evaluations, further long-term and clinical studies are required [72.281].

Diehl et al. [72.281] in their review on the clinical use of robots for individuals with autism suggested that the use of interactive robots is a promising development in light of the research showing that individuals with autism exhibit strengths in understanding the physical world and relative weaknesses in understanding the social world. Children with autism often find it very difficult to appropriately interact with their social environment. In interactions between children and their peers, teachers, and family, usually touch has an important communicative and emotional role.

However, children with ASD may be hyposensitive or hypersensitive to touch so that the children may crave or avoid touch. As part of the above-mentioned European Roboskin, a number of case study evaluations have been performed investigating how a humanoid robot equipped with touch sensors can teach children with autism about appropriate tactile interaction and the associated emotional responses. The studies used the child-sized, minimally expressive robot KASPAR [72.21]; [72.282]), a low-cost robot specifically designed for interaction – many of its features lend themselves to children with autism, e. g., the human-like but minimally expressive shape and form of the robot. The robot has 8 GlossaryTerm

DOF

s (degrees of freedom) in the head and face and 6 DOF in each arm, as well as one DOF in the torso. A comprehensive set of play scenarios for robot-assisted play for children with various special needs, and tactile play scenarios for children with autism have been developed [72.283, 72.284, 72.285].

KASPAR can play a variety of games with children, either operating completely autonomously [72.285, 72.286] and , or being partially remotely controlled by the children or an adult as part of the play scenario. Case studies on tactile child–robot interaction have shown encouraging results (Robins et al. [72.283, 72.287]). In the research on tactile interactions KASPAR was equipped with patches of tactile sensors, and it produced some responses autonomously, e. g., a ticking of the chest resulted in laughing, or hitting in the face results in the robot turning away from the child, covering its face with its hands and saying ouch, that hurts. If the child touches areas that are not covered by skin patches, or if the recognition of different types of touch is not reliable enough [72.288], then the experimenter can trigger the robot’s reaction.

Figure 72.16 shows a child with autism first hitting the robot’s leg and then exploring the robot’s reaction. Since the legs cannot detect forceful touch, the experimenter triggers the robot’s reactions. Note, this hybrid approach of autonomous behaviors combined with a remote control allows the robot to be perceptually more advanced than current state-of-the-art robotics sensing technology allows. Robots used with children with autism need to be predictable, and so a mistake in the robot’s sensing abilities can be compensated for by the adult present (experimenter, teacher or caregiver). A person who knows the child very well will also be able to trigger certain useful robot behaviors at certain movements in time which are not observable directly from the child or the context and can only be inferred from detailed knowledge about the child, his/her needs and preferences, and therapeutic goals and objectives for this particular child.

Fig. 72.16
figure 16

(a) Child hits the robot’s leg and then explores the robot’s reaction. (b) The child sees the robot looking sad, so he tickles its tummy (left) to make it happy (right). (c) The minimally expressive humanoid robot KASPAR

Note, for a robot to detect and respond to tactile interactions of children with autism is not only beneficial to teach about appropriate tactile interaction, but it can also provide a basis for adapting to a child’s individual interaction style [72.265].

Tactile interaction is often not the sole focus of interaction, it can be embedded in multimodal play scenarios with the therapeutic objective to increase children’s social skills through play. François et al. [72.279] present a long-term study where six children diagnosed with autism interact with an autonomous zoomorphic robot (Sony Aibo) over 10 sessions. The study is inspired by nondirective play therapy to encourage children’s proactivity and initiative-taking. The behavior of each child is analyzed in detail according to three dimensions (play, reasoning, affect). Unique trajectories for children’s progressions along these three dimensions were observed, resulting in unique profiles. The work highlights methodological issues in the domain of robot-assisted therapy, and also points out and formalizes different potential roles of the experimenter in the sessions, who may be a passive or an active participant [72.289]. A regulation process is introduced whereby the experimenter can regulate the interaction under specific conditions in order to:

  1. 1.

    Prevent or discourage repetitive behaviors

  2. 2.

    Help the child engage in play

  3. 3.

    Give a better pace to the game if it has already been experienced by the child

  4. 4.

    Bootstrap a higher level of play, and

  5. 5.

    Ask questions related to reasoning or affect.

8.4 Challenges and Opportunities

While many studies assume that tactile HRI will result in a more enjoyable, meaningful, and efficient interaction with a robot, many issues are still unclear and need to be investigated further. For example, a video-based study, where participants watched videos of interactions rather than interacting with the robot themselves, showed the impact of a robot’s level of autonomy, and suggests that touch behaviors are considered more appropriate for proactive as compared to reactive robots [72.290]. Much further research is needed to find out when and how tactile interaction can enhance a person’s experience of interaction and benefit the overall performance of the human–robot triads. One can also expect that individual differences will play a role in which types of tactile interaction with a robot are more appropriate, depending also on the tasks involved and the overall context and setting. The perception of people in terms of the robot’s roles and its relationship with its users is also likely to impact these issues.

In recent years a number of projects worldwide have tried to use robots for therapy of children with autism. Different modes of robot control and autonomy need to be investigated, from fully autonomous systems [72.285], to Wizard-of-Oz controlled robots [72.291], to using a hybrid approach where remote control is an integral part of the interaction and triggers autonomous behaviors [72.287, 72.289]. Realistically, for therapeutic tools to be used widely outside the laboratory and the experimental setting, the technology needs to be highly robust, reliable, easy to operate by nonresearchers as well as cost-effective.

A number of technological, methodological, and design challenges still need to be tackled, but tactile interactions with social robots open up a number of new research avenues as well as exciting applications.

9 Social Robots and Teamwork

Verbal and nonverbal communication play a very important role in coordinating joint action during collaborative tasks. Sharing information through communication acts is critical given that each teammate often has only partial knowledge relevant to solving the problem, different capabilities, and possibly diverging beliefs about the state of the task. For instance, all teammates need to establish and maintain a set of mutual beliefs regarding the current state of the task, the respective roles and capabilities of each member, and the responsibilities of each teammate [72.153, 72.33]. This is called common ground [72.162].

9.1 Human–Robot Teamwork and Collaboration

Dialog certainly plays an important role in establishing common ground. Each conversant is committed to the shared goal of establishing and maintaining a state of mutual belief with the other. To succeed, the speaker composes a description that is adequate for the purpose of being understood by the listener, and the listener shares the goal of understanding the speaker. This communication act serves to achieve robust team behavior despite adverse conditions, including breaks in communication and other difficulties in achieving the team goals.

Humans also use nonverbal skills such as visual perspective taking and shared attention to establish common ground with others. They orient their own gaze and direct the gaze of their teammate through deictic cues such as pointing gestures in order to establish common ground. Given the visual perspective taking, shared attention, and the use of deictic cues to direct attention are core psychological processes that people use to coordinate joint action about objects and events in the world, robot teammates must be able to display and interpret these behaviors and cues when working with humans in a manner that adheres to human expectations.

Breazeal et al. [72.142] investigated the impact grounding using nonverbal social cues and behavior on task performance by a human–robot team. In a human subject experiment, participants guided Leonardo to perform a physical task using speech and gesture. The robot communicates either implicitly through behavior (such as gaze and facial expressions) or explicitly through nonverbal social cues (i. e., explicit pointing gestures). The robot’s explicit grounding acts include visually attending to the human’s actions to acknowledge their contributions, issuing a short nod to acknowledge the success and completion of the task or subtask, visually attending to the person’s attention directing cues such as to where the human looks or points, looking back to the human once the robot operates on an artifact to make sure its contribution is acknowledged, and pointing to artifacts in the workspace to direct the human’s attention toward them. Both self-reporting via questionnaire and behavioral analysis of video support the hypothesis that implicit nonverbal communication positively impacts human–robot task performance with respect to understandability of the robot, efficiency of task performance, and robustness to errors that arise from miscommunication [72.142].

Common ground is grown along with partners that work together over time. In a simple example, if two persons repeatedly work in a sequence of collaborative manufacturing tasks, one will be able to easily predict what the other person will do next and thus proactively help each other. For instance, if one always needs a spanner to be passed at certain moment in the task, another person will probably take anticipatory action passing the spanner before being asked. Hoffman and Breazeal developed adaptive system that learn such task structure which enabled a robot’s anticipatory action [72.292]. They further revealed the importance of perceptual simulation [72.293].

The importance of such common ground and cognitive similarity is demonstrated in partnership in casual social interaction too. For instance, Morales et al. revealed that a robot efficiently performed side-by-side walking with a human partner when it anticipated where a human partner would walk [72.186]. Such anticipatory computing was enabled by a capability of computing a preferred walking course in a similar way as humans would do. There are robots that explicitly learn common knowledge, like one that learns names of places [72.294].

9.2 Robots as Social Mediators

Researchers have started to explore the use of robots as a social mediator. One approach is to use a human-like presence of robots as social stimuli. It is known that presence of people facilitate others. For instance, people perform simple math-calculation faster when being watched by someone else. This is known to be social facilitation effect [72.295]. Riether et al. [72.296] reported that people’s performance on easy math-calculation is improved because of presence of robots [72.296]. Takano et al. [72.297], put an android robot as a bystander where patients meet a medical doctor in a hospital, and found that it moderated clients’ anxiety and let them believe that the doctor pays more attention to them [72.297].

Another approach is to use a robot as an active coordinator in humans’ social settings. When there are many people, a coordinator could make their activity more efficient. Such a role can also successfully replicated by a robot. Consider the situation where an interactive robot is placed in the middle of a group of kids, and they start to push each other away when they want to play with the robot differently from each other. Shomi et al. [72.298] , developed a technique to identify when crowd of people around a robot is disordered, and let the robot perform attention-controlling behavior so that children play together with the robot in coordinated way [72.299] (Fig. 72.17). In other work, robots have been used to facilitate elderly people’s group conversation. Matsuyama et al. [72.300], developed a robot to participate in a quiz game, conducted as a recreation activity in elderly-care facility, and provide inspiring answer to facilitate other elderly people to continue the game [72.300]. In these works, robots actively model the social situation and intervene in people’s activities based on its understanding of the situation.

Fig. 72.17
figure 17

The robot interacted with a crowd of people with coordinating their attention

9.3 Research Direction

Human activities are often social, involving multiple people who often have different skills, desires, and goals. Although the research is only in an early stage we have started to model such social situations while revealing potential roles of robots in such social settings. Along with further advance of relevant technologies, e. g., manipulation capability, navigation capability, and language capabilities, there should be many potential uses to be unveiled. However, it is most likely that the underlying theoretical work still needs a lot more work as well.

10 Conclusion

In this chapter, we have presented some of the principal research trends in social robotics and human–robot interaction. We have relied heavily on examples from our own research to illustrate these trends, and have used excellent examples drawn from other research groups around the world.

From this overview, we have shown that one of the most important goals of social robotics as applied HRI is the creation of robots that are human-compatible and human-centered in their design. Their differences from human abilities should complement and enhance our strengths and support how people help one another. Their similarities to human abilities, such as computationally implementing human cognitive, affective, or multimodal communication models make them more intuitive for people to understand and interact with. Further, such robots are also being used as a scientific tool to help us to understand ourselves better. With this broadening understanding, social robots are being designed to offer increasingly sophisticated levels of social, affective, cognitive, and task-based support for people, opening new applications for robots in education, health, therapy, communication, domestic tasks, physical tasks requiring coordination and teamwork, and more. As the field advances, social robots are being applied to increasingly sophisticated tasks, in increasingly complex human environments, for longer deployment periods. We expect that in the coming decades, many other researchers, especially young researchers, will actively contribute to the transition from today’s robots into capable robot partners of tomorrow.

11 Further Reading

For further reading, we recommend the following conference proceedings, journals, books, articles:

  • Annual conference proceedings:

    • Proceedings of the ACM/IEEE International Conference on Human–Robot Interaction (HRI)

    • Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (GlossaryTerm

      ROMAN

      )

    • AAAI Symposium Series

    • AISB Symposium Series

  • Journals:

    • Journal of Human-Robot Interaction (http://humanrobotinteraction.org/journal/)

    • Interaction Studies-Social Behaviour and Communication in Biological and Artificial Systems published by John Benjamins Publishing Company

    • International Journal of Social Robotics by Springer

    • IEEE Transactions on Autonomous Mental Development (TAMD)

    • IEEE Transactions on Human–Machine Systems

    • IEEE Transactions on Affective Computing

    • Paladyn, Journal of Behavioral Robotics, de Gruyter

    • PLoS ONE

Reviews and overviews can be found in several books and articles:

  • Books:

    • C. Breazeal: Designing Sociable Robots (MIT Press, Cambridge 2002)

    • R. W. Picard: Affective Computing (MIT Press, Cambridge 1997)

    • J.-M. Fellous, M. Arbib (Eds.): Who Needs Emotions: The Brain Meets the Robot (Oxford, Oxford Univ. Press 2005)

    • K. Dautenhahn, J. Saunders (Eds.): New Frontiers in Human–Robot Interaction, Advances in Interaction Studies, (John Benjamins Publishing, Amsterdam 2011)

    • T. Kanda, H. Ishiguro (Eds.): Human-Robot Interaction in Social Robotics (CRC Press, Boca Raton 2012)

  • Review Articles:

    • T. Fong, I. Nourbakshsh, K. Dautenhahn: A survey of social robots, Robotics and Autonomous Systems 42, 143–166 (2003)

    • M.A. Goodrich, A.C. Schultz: Human–robot interaction: A survey, Foundations and Trends in Human-Computer Interaction 1(3), 203–275 (2007)