1 Introduction

Interactive robots developed for social human–robot interaction (HRI) scenarios need to be socially intelligent in order to engage in natural bi-directional communication with humans. Namely, social intelligence allows a robot to share information with, relate to, and understand and interact with people in human-centered environments. Robot social intelligence can result in more effective and engaging interactions and hence, better acceptance of a robot by the intended users [13]. The challenge lies in developing interactive robots with the capabilities to perceive and identify complex human social behaviors and, in turn, be able to display their own behaviors using a combination of natural communication modes such as speech, facial expressions, paralanguage and body language.

Our research focuses on affective communication as displayed through body language during social HRI. Our previous work in this area has resulted in the development of automated human affect recognition systems for social robots in order to determine a person’s accessibility and openness towards a robot via static body language during one-on-one HRI scenarios [49]. Alternatively, in this paper, we focus on a robot’s ability to display emotional body language. In particular, we explore the design of emotional body language for our human-like social robot, Brian 2.0 (Fig. 1). For Brian 2.0 to be able to effectively display emotional body language that can be easily recognized by different human users, we utilize human emotion research to determine how humans display and recognize emotions through the use of body postures and movements, and apply a similar approach for the generation of Brian 2.0’s emotional body language.

Fig. 1
figure 1

One-on-One HRI with the Social Robot Brian 2.0

In general, it has been identified that non-verbal communication, which includes body language, facial expressions and vocal intonation, convey a human’s intent better than verbal expressions, especially in representing changes in affect [10]. To date, a significant amount of research has focused on the recognition of human emotions through facial expressions [11] and vocal intonation [12], or a combination of both [13], with only little interest being placed directly on emotion recognition from body language. Body language plays an important role in communicating human emotions during interpersonal social interactions.

Although initial work by Graham et al. [14] suggested that human bodily cues and hand gestures do not function as an additional source of information in the communication of emotion with respect to facial expressions, more recent research has shown that human body language plays an important role in effectively communicating certain emotions either combined with facial expressions [15] or alone on its own [16, 17]. In [15], a study using the display of both congruent and incongruent facial expressions and body language confirmed that both face and body information influence emotion perception. The authors noted that increased attention to bodies and compound displays could provide a better understanding of what is communicated in nonverbal emotion displays. They also mentioned the potential importance of dynamic stimuli. In [16], the influence of the body, face and touch on emotion communication was investigated. With respect to the body, it was determined that body language was the dominant non-verbal communication channel for social-status emotions which include embarrassment, guilt, pride and shame. In [17], a study investigating the recognition of the basic emotions of anger, fear, happiness and sadness, conveyed only through body language, found high recognition rates (greater than 85 %) for all the emotions. Work by Ekman [18] has identified that people are more likely to consciously control or tune their facial expressions over their body language. This is due to the fact that in general we pay a lot of attention to each other’s facial expressions and hence, can actively adapt our expressions to others for different scenarios. However, since feedback on body language from others is rare, we do not censor natural body movements. Hence, body language is considered an important channel for communicating a person’s emotions.

With respect to virtual agents, a lot of research has focused on investigating the display of emotions through facial expressions or a combination of both facial expressions and tone of voice as discussed in [19]. However, fewer works have emphasized the display of emotions through body movements, e.g. [20], or the combination of facial expressions and body movements, e.g. [19]. Similar developments in non-verbal emotion communication for humans and virtual agents also exist for robotic applications. In particular, with respect to the robotic display of emotions, the majority of the existing research has been on identifying facial nodes and actuation techniques in order for robots to be able to display believable facial expressions, e.g. [21, 22], or on the recognition of the facial display of basic emotions by a robot, e.g. [23]. To date, only a handful of researchers have focused on the use of robotic body language to display emotions, with the primary emphasis being on the display of emotions through dance, e.g. [2430].

In this work, we aim to identify the appropriate emotional body language for the human-like robot Brian 2.0 to display during natural one-on-one interpersonal social interactions with a person, where in such interactions emotional dance may not be appropriate. Our contributions in this paper are as follows: (1) to uniquely investigate if life-sized human-like social robots can effectively communicate emotion by utilizing a combination of human body language features defined by researchers in psychology and social behavioral science, and (2) to conduct a novel comparison study to investigate the effectiveness of these human body language features in communicating emotion when displayed by such a robot with respect to a human actor, where the robot has fewer degrees of freedom.

Our goal is to demonstrate that body movements and postures for human-like robots can represent certain emotions and hence, should be considered as an important part of interaction on the robot’s side. The comparison study is performed with Brian 2.0 and a human actor both performing body movements and postures based on the same body language descriptors in order to investigate if non-experts can recognize the emotional body language displayed by the human-like robot, with fewer degrees of freedom, with similar recognition rates as a human. The study will allow us to determine which body movements and postures can be generalized for the robot to display a desired emotion as well as explore whether human body language can be directly mapped onto an embodied life-sized human-like robot. Feasibility in our case is based on human recognition rates of Brian 2.0’s emotional body language. Distinct from other robot body language studies in the literature, we focus on the use of social emotions that could be the causation of interpersonal factors during social HRI and if these emotions are perceived differently by individuals when displayed by a human-like robot or a human actor. In our work, we consider the implementation of body movements and postures defined by Wallbott [31] and de Meijer [32] for a variety of different emotions.

The rest of this paper is organized as follows. Section 2 provides a discussion on the current research on emotional body language for both humans and robots. Section 3 describes our social robot Brian 2.0 and Sect. 4 defines the emotional body language features utilized for the robot. Sections 5 and 6 present and discuss experiments conducted to evaluate the feasibility of the robot’s emotional body language as well as a comparison study to investigate the perception of the same emotional body language movements and postures when displayed by a human actor versus the robot. Lastly, concluding remarks are presented in Sect. 7.

2 Emotional Body Language

2.1 Human Display of Emotional Body Language

Early research on body language in [33] presented the importance of leaning, head pose and the overall openness of the body in identifying human affect. Participants were shown images of a mannequin in various body postures and asked to identify the emotion and attitude of the posture. The results indicated that posture does effectively communicate attitude and emotion, and that head and trunk poses form the basis of postural expression, with arms, hands and weight distribution being used to generate a more specific expression. More recent research presented in [34] has shown that emotions displayed through static body poses are recognized at the same frequency as emotions displayed with facial expressions. Participants viewed images of a woman displaying different poses for the emotions of happiness, fear, sadness, anger, surprise and disgust, and were asked to identify the corresponding emotions. The results showed that the body poses with the highest recognition rates were judged as accurately as facial expressions. In [35], a study performed with 60 college students utilizing stick figures showed that emotion was strongly related to varying head and spinal positions. For the study, the students were asked to choose, from a list, the emotions of 21 stick figures with three different head positions and seven different spinal positions. The emotions on the list included anger, happiness, caring, insecurity, fear, depression and self-esteem. It was found that upright postures were identified more often as positive emotions while forward leaning postures were identified more often as negative emotions. A comparison of the results with the emotional states of the participants found that the participants’ own emotional states did not influence their emotional ratings of the figures. In [36], Coulson investigated the relationship between viewing angle, body posture and emotional state. Images from three different viewing angles of an animated mannequin in numerous static body poses (derived from descriptions of human postural expressions) were shown to 61 participants who identified the emotions they felt best described each image. The findings indicated that the emotions of anger, sadness and happiness were identified correctly more often than disgust, fear and surprise, and that a frontal viewing angle was the most important viewing angle for identifying emotions. It was also found that surprise and happiness were the only two emotions from the aforementioned emotions that were confused with each other. A similar study to [36] was presented in [37], where instead of an animated wooden mannequin more human-like characters were presented in images to 36 subjects in order for them to distinguish between postures for different expressive emotions. The subjects were asked to group the posture images into the emotions of happiness, sadness, anger, surprise, fear and disgust, and then rate the intensity of emotion expression in each image on a five-point Likert scale. The results identified that happiness had the highest recognition rate, while disgust had the lowest. Furthermore, a different intensity level was assigned to each posture in the same emotion group.

In [17], a database of full body expressions of forty-six non-professional actors, with their faces blurred out, was presented to 19 participants. The participants were asked to categorize the emotion displayed by the expressions based on a four alternative (anger, fear, happiness, sadness) forced-choice task. The results showed that sadness had the highest recognition rate at 97.8 % and happiness had the lowest rate at 85.4 %. In [38], a study was conducted to illustrate that facial expressions are strongly influenced by emotional body language. In the study, twelve participants were presented with images of people displaying fearful and angry facial expressions and body language that were either congruent or incongruent. The participants viewed the images and were asked to explicitly judge the emotion of the facial expression while viewing the full face–body combination. The results showed that recognition rates were lower and reaction times were slower for incongruent displays of emotion. Furthermore, it was found that when the face and body displayed conflicting emotional information, a person’s judgment of facial expressions was biased towards the emotion expressed by the body. Comparison studies presented in [39] also investigated the influence of body expressions on the recognition of facial expressions as well as emotional tone of voice. The results reemphasized the importance of emotional body language in communication, whether displayed on its own or in combination with facial expressions and emotional voices.

Although, the aforementioned studies have been successful in validating emotion recognition from human bodies, they all focus on only static poses and do not take into account the dynamics of body language that are also present during social interactions. In [36], even though not considered, Coulson discusses the potential importance of considering body movements in addition to static postures for emotion display.

Recognition and interpretation of a person’s emotions is very important in social interaction settings. Ekman and Friesen [40] were the first to indicate the importance of body language in conveying information concerning affective states between two individuals in communicative situations. Furthermore, a detailed review of the literature by Mehrabian [41] showed a link between the body posture of one person and his/her attitude towards another person during a conversation. In particular, body orientation, arm positions and trunk relaxation have been found to be consistent indicators of a person’s attitude towards the other person. During social interactions, static body poses may not provide enough information to define a person’s emotions as the body can move a great deal while interacting, and such movement can provide information regarding the intensity and specificity of the emotion [42]. Hence, there exists a consensus that both body movements and postures are important cues for recognizing the emotional states of people when facial and vocal cues are not available [42]. In [42], point-light and full-light videos and still images of actors using body motions to portray five emotions (anger, disgust, fear, happiness and sadness) at three levels of intensity (typical, exaggerated and very exaggerated) were presented to 36 student participants for a forced-choice emotion classification study. For the point-light videos, strips of reflective tape were placed on the actors to only highlight the motion of the main body parts including the ankles, knees, elbows and hands, while a full-light video illuminated a person’s whole body. The still images were frames extracted from the point-light and full-light videos which depicted the peak of each emotional expression. The results of the study showed that exaggeration of body movements improved recognition rates as well as produced higher emotional intensity ratings. The emotions were also identified more readily from body movements even with the point-light videos which minimized static form information.

In [43], the characteristics of a person’s gait were examined to see if emotional state could be identified from walking styles. Observers examined four different people walking in an L-shaped path while displaying four emotions and then identified which emotion each walking style represented. The results showed that the emotions of sadness, anger, happiness and pride could be identified at higher than chance levels based on the amount of arm swing, stride length, heavy footedness and walking speed. In [44], the point-light technique was used to present two dances performed by four dancers (two male and two female) to 64 participants. The dances had the same number of kicks, turns and leaps, however, had different rhythms and timing. It was found that the participants identified that certain movements corresponded to the emotions of happy and sad. Namely, the happy dance was more energetic and consisted of free and open movements, while the sad dance consisted of slow, low energy and sweeping movements. In [45], videos of actors performing emotional situations utilizing body gestures with their faces blurred and no audio were presented to groups of young and elderly adults. One group of 41 participants (21 young adults and 20 elderly adults) were asked to label each of the videos as one of the following emotions: happy, sad, angry and neutral. A second group of 41 participants (20 young adults and 21 elderly adults) were asked to rate the following movement characteristics of the body gestures on seven-point Likert scales: (1) smoothness/jerkiness, (2) stiffness/looseness, (3) hard/soft, (4) fast/slow, (5) expanded/contracted, and (6) almost no action/a lot of action. The results with the first group showed that both the young and elderly adults were able to perform accurate emotion identification, however, the elderly adults had more overall error especially with respect to the negative emotions. With respect to movement characteristics, it was found that the angry body language was identified to have the jerkiest movements, followed by happy, while sad and neutral had the smoothest movements. In addition, angry was rated to have the stiffest movements followed by sad. Happy and neutral had the least stiff movements. Lastly, the body movements for happy and angry were found to be faster and have more action than those for sad and neutral. In [46], arm movements performing knocking and drinking actions which portray the ten affective states of afraid, angry, excited, happy, neutral, relaxed, strong, tired, sad and weak were presented as point-light animations to participants. Fourteen participants were asked to categorize each point-light animation as one of the aforementioned ten affective states. It was found that the level of activation of an affective state was more accurately recognized for the arm movements than pleasantness using a two-dimensional scale similar to the circumplex model [47].

In [31], Wallbott investigated the relationship between body movements and postures, and fundamental and social emotions. The movements and postures included collapsed/erected body postures, lifting of the shoulders, and head and arm/hand movements. Six female and six male professional actors performed 14 different emotions. Twelve drama students acted as expert coders to identify a sample of videos which had the most natural and recognizable emotions of the actors. Then these videos were coded by two trained observers. The 14 emotions considered were elated joy, happiness, sadness, despair, fear, terror, cold anger, hot anger, disgust, contempt, shame, guilt, pride and boredom. Inter-observer agreements of 75–99 % were found for the body movement categories representing the upper body, shoulders, head, arms and hands. Wallbott found that statistically significant relationships exist between specific movements and postures of the body, head and arms, and each of the 14 different emotions. For example, boredom can be characterized by a collapsed upper body, an upward tilted head, inexpansive movements, low movement activity and low movement dynamics. The results of the discriminant analysis resulted in a 54 % correct classification for all the emotions with shame having the highest correct classifications at 81 %, followed by elated joy at 69 %, hot anger at 67 % and despair, terror and pride with the lowest classification percentages at 38 %.

In [32], de Meijer investigated the relationship between gross body movements and distinct emotions. The body movements studied included trunk and arm movements, movement force, velocity, directness, and overall sagittal and vertical movements. Eighty-five adult subjects were shown 96 videos of three actors performing these various body movements and asked to rate the compatibility of the body movements, on a four-point Likert scale, with respect to 12 emotions: interest, joy, sympathy, admiration, surprise, fear, grief, shame, anger, antipathy, contempt and disgust. The results showed that the participants rated the majority of the body movements as expressing at least one emotion. Furthermore, it was determined that a unique combination of body movements was utilized to predict each distinct emotion. For example, a stretching trunk movement while opening and raising the arms would lead to the subjects selecting the emotion joy.

The aforementioned literature review has shown the importance of emotional body language in recognizing the emotions displayed by people, especially in social settings. In particular, it has been determined that specific body poses, postures and movements can communicate distinct emotions. Therefore, in order to achieve effective social HRI, it is important for a socially interactive robot to be able to use body language to display its own emotions, which can then be appropriately interpreted by a person engaged in the interaction at hand.

2.2 Robot Display of Emotional Body Language

A number of robots have been designed to display specific emotions through dance, i.e., [2430]. In particular, some researchers have utilized Laban body movement features from dance to generate robot emotions, i.e. [2428]. Laban movement analysis investigates the correlation between a person’s body movements and his/her psychological condition [48]. For example, a movement that is strong, flexible and has a long duration gives a psychosomatic feeling of relaxation. The four major Laban movement features are defined as space, time, weight and flow [48]. Space relates to whole body movements, it measures how direct, open and flexible the body movements are. The time feature determines the speed at which body movements travel spatially, i.e., if a body movement is sudden or sustained. Weight determines the energy associated with movements, i.e., if they are firm or gentle. The flow feature is concerned with the degree of liberation of movements, identifying if movements are free or bound. In [24], Laban features were utilized to create dancing motions for a mobile robot with 1 rotational degree of freedom (DOF) for each arm (two arms in total) and 1 DOF for head nodding. The robot performed six different dances, each displaying one of the following emotions: joy, surprised, sad, angry or no emotion. In [25] and [26], Laban dance features were used to define the motions of the small 17 DOFs KHR-2HV human-like robot for the emotions of pleasure, anger, sadness and relaxation. In particular, in [25], each of these emotions was attributed to only three distinct body movements which consisted of raising and lowering the arms. In [27], Laban dance theory was utilized to describe the body movements of a teddy bear robot. Arm and head motions of the robot were attributed with the emotions of joy, sadness, surprise, anger, fear and disgust. In [28], the 17 DOFs small humanoid Robovie-X robot generated dance movements to express the emotions of anger, sadness and pleasure based on Laban movement analysis and modern dance using its upper body, head, arms, hands, legs and feet.

Other robots have also been designed to mimic human emotional dance without utilizing Laban movement features, e.g. [29, 30]. For example, in [29], the Sony QRIO robot was used to imitate the dance motions of a person in real-time using moving region information, with the goal to create sympathy between a person and the robot. In [30], the Expressive Mobile Robot generated emotionally expressive body movements based on classical ballet using 7 DOFs in its arms, head and wheels. Experiments were conducted to see which body movements people found natural as well as which body movements depicted a feeling of interest by the robot.

A relatively small number of robots have also been developed to display emotions using body movements without incorporating emotional dance. For example, Keepon, a tele-operated chick-like robot utilizes the body movements consisting of bobbing, shaking, and swaying to convey the emotions of excitement, fear and pleasure, respectively [49]. The robot has been designed for interactions with children diagnosed with autistic spectrum disorders. In [50], the design of an insect-like robotic head with two arm-like antennas was presented to express different emotions using exaggerated expressions of animated characters. Namely, the change in color of the eyes and antennas, the motion of the antennas and the eye emoticons can be used to display such emotions as anger, fear, surprise, sadness, joy and disgust. Examples for expressive antenna motions include the ends of the antennas being brought in front of the eyes for fear and swept backwards for surprise. In [51], the small humanoid robot Nao was utilized to express the emotions of anger, sadness, fear, pride, happiness and excitement through head movements in a range of different robot poses. The poses of the robot were designed based on motion capture information of a professional actor guided by a director. In [52], the human-like WE-4RII robot was used to display emotions using facial expressions and upper body movements (especially hand movements). The facial and body patterns to display for the emotions were based on recognition rates from a pre-experiment where several simulated patterns were presented to subjects. Both the posture and velocity of the body were used to display the emotions of neutral, disgust, fear, sadness, happiness, surprise and anger. In [53], the Nao robot was also utilized to generate the emotions of anger, fear, sadness and joy with body movements, sounds (i.e., crying, growling, banging), and eye colors (i.e., red for angry, dark green for fear, violet for sad, yellow for joy) in order to map these emotions onto the Pleasure–Arousal–Dominance (PAD) model. The authors stated that they used psychological research inspired by the work of Coulson [36] and de Meijer [32], TV shows and movies to link emotions to body movement, sound and color. Expressions did also include dancing for the emotion of joy and saying “Jippie Yay” with the robot’s eyes turning yellow. The robot’s emotion expressions were first evaluated in a pre-test and then each single expressional cue was individually investigated in the experiments in order to determine the expressivity of each stimulus for each emotional cue. However, for these expressions, the authors did not specify which descriptors from Coulson and de Meijer they considered and for which emotions. Hence, it is not clear how the poses/movements of the small robot are directly linked to existing human psychological studies.

In general, the emotions of robots designed for HRI have mainly been derived from body movements from dance or robot-specific characteristics. For the latter group, robot-specific movements have usually been generated that cannot easily be generalized to other robots. With respect to emotional dance, the corresponding body movements are more appropriate for small robots that can have a larger workspace (i.e., table tops) during HRI, and cannot be effectively used for larger robots engaging in natural one-on-one social interactions, such as our robot Brian 2.0. To date, research into the use of emotions based on human body movements and postures for social interactions is non-existent for robotic applications with the exception of [53]. However, in [53], emotional dance is still incorporated into some of the small sized Nao’s emotional expressions and the link between the robot’s body language and human body language is not directly clear. Hence, our research explores the challenge of using natural human body movements and postures to represent social emotional behaviors for life-sized human-like robots in order for the robots to effectively communicate while building interpersonal relationships during one-on-one social interactions.

2.3 Human Perception of Robotic Body Language

A handful of researchers have primarily investigated human perception of robot body language in representing specific emotions. In [51], the head positions of Nao were utilized to investigate the creation of an affect space for body language. Twenty-six participants were asked to identify the emotions displayed by the robot, based on different head movements, as anger, sadness, fear, pride, happiness or excitement. Participants were also asked to rate the level of valence and arousal of each emotion utilizing a ten-point Likert scale. The results showed that a head-up position increased the recognition rates of the emotions of pride, happiness and excitement, and a head-down position increased the recognition rates of the emotions of anger and sadness. The position of the head was also found to be related to the perceived valence of the robot’s emotion but not to arousal. In [52], the human-like robot WE-4RII was utilized to determine how well participants could recognize the emotions of the robot utilizing facial expressions, body and hand movements. It was found that the participants recognized emotions more often when emotional hand movements were included with facial expressions and body movements. In [53], 67 participants were asked to identify which combination of body movements, sounds, and eye colors that the Nao robot displayed were most appropriate for the emotions of anger, fear, sadness and joy. Then another study was conducted with 42 participants, where the robot separately displayed body movements, sounds and eye colors for the same emotions. In this latter study, the participants were asked to assign a specific value within the PAD model for each of the individual expressions. It was found that body movements achieved the best results. In [54], one set of participants (which included amateurs and expert puppeteers) was asked to create simple non-articulated arm and head movements of a teddy bear robot for different scenarios. Another set of participants was asked to watch animations or videos of these robotic gestures and to judge the emotions that were displayed based on the simple movements created. The emotions that were available to the second set of participants to choose from included happy, interest, love, confused, embarrassed, sad, awkward, angry, surprised and neutral. The participants also rated the lifelikeness of the gestures and how much they liked the gestures. The results showed that emotions can be conveyed through simple head and arm movements for the teddy bear robot and that recognition rates increased when the participants were given the situational context for the gestures. The gestures for fear and disgust were found to be better understood when created by expert puppeteers rather than amateurs, however, this was not true for the other emotional movements. It was also found that positive emotions and more complex arm movements were rated as more lifelike.

Studies determining recognition rates of emotions based on the use of Laban body movements have also been conducted. For example, in [28], emotional dance for the three basic emotions of anger, sadness and pleasure was displayed by the small humanoid Robovie-X robot to two different groups of Japanese participants. In particular, a group of elderly individuals and a group of young individuals were asked to watch and identify each emotion displayed by the robot’s body movements. The results showed differences in the perception of emotion from robot body language between the two groups. The authors suggested that these differences are due to variations in the focus and cognition of the two groups when identifying the emotions such as their attention to different body parts and their perception of the magnitude and speed of the robot’s motions. Hence, body language of the robot should be designed with the consumer in mind. In [25], 33 subjects watched the KHR-2HV human-like robot’s movements and categorized these movements as being a weak or strong display of pleasure, anger, sadness or relaxation. The subjects first watched the robot display basic movements and then eight processed whole-body movements which represented the target emotions. The results showed that the subjects could identify the emotions of sadness, pleasure and anger for the movements but not relaxation, and that some emotions could easily be confused with each other such as pleasure with anger, and sadness with relaxation. In [27], 88 Japanese subjects were asked to identify the emotions related to the Laban body movements displayed by a teddy bear robot with 6 DOFs in the head and shoulders. The emotions were chosen from a list which included joy, anger, surprise, fear, disgust and sadness. They were also asked to rate on a four-point Likert scale how clearly the emotions were displayed. The results found that with simple arm and head movements, the emotions of joy, sadness, surprise and fear could be recognized. However, anger and disgust were not easily recognized by the subjects. In [24], 21 student participants were asked to judge the intensity and type of emotions (joy, surprised, sad, angry and no emotion) displayed by a mobile robot to determine correlations between these emotions and the robot’s effort and shape movement characteristics that are based on Laban movement features. Effort represents dynamic features of movement or quality of movement, whereas shape represents geometrical features of the overall body. The results showed that strong body movements were correlated with joy, and they were also correlated along with ascending and enclosing shape features to surprise. Weak body movements were correlated with sadness and an advancing body movement was correlated with angry.

Contrary to the robotic studies presented above, in this paper, we present a unique comparison study of the recognition rates of the emotional body language of our human-like social robot Brian 2.0 with the recognition rates of the same emotional body language displayed by a human actor in order to investigate the quality of the body language displayed by the robot. This will allow us to determine which body movements and postures can be generalized for the robot to display for a desired emotion, in addition to exploring whether human body language can be directly mapped onto an embodied life-sized human-like robot. We will use non-expert participants in our study, as it is intended that Brian 2.0 will be interacting with the general population. The body language features used in our work will be derived from the emotional body movements and postures defined by Wallbott [31] and de Meijer [32]. We will consider the emotions and corresponding body language that are applicable for social HRI scenarios. The emotions that will be investigated, herein, are happiness, sadness, boredom, interest, elated joy, surprise, fear and anger.

3 The Social Robot Brian 2.0

The human-like robot Brian 2.0 has similar functionalities to a human from the waist up (Fig. 2a). The dimensions of the upper body of the robot have been modeled after a male volunteer. The robot is able to display non-verbal body language via: (a) a 3 DOFs neck capable of expressing realistic head motions such as nodding up and down, shaking from side to side and cocking from shoulder to shoulder, (b) an upper torso consisting of a 2 DOFs waist allowing it to lean forward and backwards as well as turn side to side, and (c) two arms with 4 DOFs each: 2 DOFs at the shoulder, 1 DOF at the elbow and 1 DOF at the wrist. Utilizing these body parts, the robot is capable of displaying various human-like body movements and postures.

Fig. 2
figure 2

13 DOFs Human-like Social Robot Brian 2.0

4 Emotional Body Language Features

As previously mentioned, since both body movements and postures are important cues for recognizing emotional states displayed by an individual, we focus on defining emotional body language for our robot Brian 2.0 that encompasses both these characteristics. This body language should be consistent with emotions that a robot would display during social HRI scenarios. In this work, the body language classification of Wallbott [31] and de Meijer [32] are utilized to generate body language corresponding to the emotions of sadness, elated joy, anger, interest, fear, surprise, boredom and happiness. We have chosen to use this set of eight emotions as they provide a large variation across both the valence (positive and negative feelings) and arousal dimensions of affect. For example, sadness represents negative valence whereas elated joy represents positive valence; and boredom represents low arousal whereas surprise represents high arousal. Furthermore, these emotions are included within a group of emotions that psychologists define as social emotions, [5557]. Namely, social emotions which can also include the basic emotions of happiness, sadness, fear and anger serve a social and interpersonal function, where an individual’s relationship to another individual can be the central concern for these emotions [5860]. Hence, these emotions involve the presence of a (real or virtual) social object which may include another person or a social constructed self [61]. The set of eight emotions that we have chosen, herein, can be used by the robot to engage in social communication with a person in order to accomplish different interaction goals such as, for example, obtaining compliance or gathering information.

The body language descriptors used for the different emotions are presented in Table 1. The emotions of sadness, elated joy, boredom and happiness are derived from body movements defined by Wallbott [31] and the other four emotions of anger, interest, fear and surprise are derived from de Meijer’s work [32]. Body language classification was taken from both these works in order to allow us to accommodate the range of proposed emotions for our robot. The emotional body movements and postures chosen can be achieved based on the robot’s mobility specifications and include upper trunk, head and arm movements as well as the overall movement quality. Trunk movement is classified as either the stretching or bowing of the upper trunk, which the robot emulates by leaning forwards or backwards at the waist. The movement of the head consists of facing forwards, tilting backwards or facing downwards and is achieved via the robot’s 3 DOFs neck. The arm motions are defined as: (1) hanging—when resting at the sides of the robot, and (2) opening/closing—for opening, the arms start near the center of the robot and move outwards away from the body, while closing consists of the opposite motion. The overall direction of movement is also described as forwards, backwards, upwards and downwards based on the motion of the trunk, arms and head of the robot in these directions. The movement quality represents the overall speed, size and force of movements and is divided into three main categories [31]: (1) movement dynamics which refers to the energy, force or power in a movement; (2) movement activity which refers to the amount of movements, and (3) expansive or inexpansive movements which refer to the large or small spatial extension of the robot’s body.

Table 1 Body language descriptors for different emotions

In order to implement the emotional body language descriptors in Table 1, the kinematic model for Brian 2.0 (shown in Fig. 2b) is utilized. For example, to implement the descriptors for elated joy, the following joints are used. The revolute joint 1 rotates the robot’s trunk to an upright position, where the trunk is perpendicular to the ground to represent a stretched trunk posture. The two shoulder joints (joints 3 and 4, and 7 and 8) and elbow joints (joints 5 and 9) of each arm are used to move the arms of the robot in an upwards and outwards direction to mimic opening of the arms. Joint 12 is used to tilt the head back. The combination of the trunk, arms and head motion represents the overall upward motion of the robot. High movement activity is achieved by repeating the upwards motions several times. High movement dynamics are achieved by high joint velocities. Expansive movements increase the spatial workspace of the robot during the display of body language and are implemented through the motion of opening the arms as well as the rotating of both the trunk and head from left to right using joints 2 and 11.

5 Experiments

The first objective of the experiments was to determine if non-expert individuals would be able to identify emotions from the body language displayed by the human-like social robot Brian 2.0. The second objective of the experiments was to compare how individuals interpret the same emotional body language displayed by the robot and a human actor. Participants were asked to watch videos of both Brian 2.0 and an actor displaying the same emotional body language and then identify the corresponding emotion being displayed in each of the videos. The results were then analyzed to determine which emotions were recognized in both cases, and how the recognition results compared.

For the videos, the actor was instructed to perform the body language descriptors in Table 1 while keeping a neutral facial expression. He rehearsed the body movements and postures under the guidance of the authors prior to their videotaping. With respect to the robot, the neutral pose of the robot’s face was displayed throughout the videos by not actively controlling the robot’s facial actuators during the display of the body language.

5.1 Participants

A total of 50 (30 female and 20 male) participants took part in the overall study after accounting for dropouts. The participants ranged in age from 17 to 63 years with a mean age of 27.78 (SD = 9.13). The participants were all from North America, where the human actor was also from. None of the participants were familiar with social robots.

5.2 Procedure

Each participant logged on to a secure website that was developed by the researchers. On the website, the participants were able to watch separate videos of first the robot and then the human actor displaying the emotional body language defined in Table 1 in a random order. An initial pilot study with two groups of ten participants was performed prior to the experiment to determine if the order of presentation of the robot and actor videos would influence recognition of the emotional body language displays. The results of a two-tailed Mann–Whitney U test performed on the recognition rates of the two groups indicated no significant order effects, U = 42, \(p = 0.579\). Based on this finding, we showed the videos of the robot first so that we could also initially focus on obtaining the results needed to address the first objective of the experiment, i.e. to determine if non-expert individuals would be able to identify emotions from the body language displayed by the robot. Emotional body movements and postures were displayed in the videos without any facial expressions for both the robot and actor. This procedure follows a similar approach used in other robot emotional body language studies, e.g. [27, 28, 51, 54]. We decided not to cover the robot’s/actor’s face when presenting the videos to the participants in order to be able to clearly show head movements and the different angles of the head that are significant descriptors for the emotions, as well as any interactions between the other body parts and the head. The participants were informed that the faces in the videos would be in a neutral emotional state. A forced-choice approach was utilized, where after the participants watched each video, they were asked to select the emotion they thought was best described in the video from the following list of eight possible emotions: sadness, fear, elated joy, surprise, anger, boredom, interest and happiness. The use of this type of forced-choice approach is very popular in studies on emotion recognition, e.g. [17, 42, 45, 46, 51]. Additionally, the forced-choice approach used herein has many advantages, including: (1) it allows for simple interpretation, i.e. it does not require the expert coding of open ended questions [62], (2) it fits the categorical nature of emotions [62], and (3) by not including a “none of the above” option, controls for participant bias, ensuring that data is collected from every participant [63, 64]. An emotion needed to be selected by the participant for each video in order for the next video to be displayed to him/her. Eight videos were each shown for the robot and for the actor.

The average length of the videos was approximately 10 s, during which the appropriate body movements and postures were repeated three times. Example frames from each of the videos are shown in Figs. 3 and 4. The videos were recorded with a Nikon D7000 camera at 30 frames per second and a resolution of 1,280 by 720 pixels. The layout of the website was such that after each video was played, the list of possible emotions was presented to the participants directly to the left of the video, as shown in Fig. 5.

Fig. 3
figure 3

Example frames of emotional body language displayed by Brian 2.0 for the eight emotions

Fig. 4
figure 4

Example frames of emotional body language displayed by the human actor for the eight emotions showing similar movement profiles as the robot

Fig. 5
figure 5

Example of the website layout for the emotional body language study

5.3 Data Analysis

A within-subjects experimental design was implemented. Confusion matrices were utilized to represent the recognition rates for the emotions for both the robot and human actor. A \(\chi ^{2}\) goodness of fit test was used to estimate the likelihood that the correct emotions that were observed for the corresponding body language did not occur due to random chance. A binomial test was utilized to determine if the desired emotion could be recognized more often than all other emotions for the respective body language.

A direct comparison study with respect to the recognition rates for the robot and human actor was conducted to determine the feasibility of using the chosen body language for the human-like social robot. A McNemar test was implemented to test if there is a significant difference between the recognition rates for the robot and the human actor. The null hypothesis used for the McNemar test was defined as: the emotion recognition rates for both the robot and human actor are the same.

5.4 Experimental Results

5.4.1 Identifying the Emotional Body Language Displayed by the Human-like Robot Brian 2.0

The recognition rates for the emotions displayed by the robot are presented in the confusion matrix in Table 2. Rows in Table 2 represent the emotions chosen by the participants and columns represent the true labeled emotions. Sadness had the highest recognition rate at 84 % followed by surprise with a recognition rate of 82 %. Anger and elated joy had recognition rates of 76 and 72 %, while boredom and interest had rates of 56 and 38 %, respectively. The emotions with the lowest recognition rates were fear with a rate of 26 % and happiness with a rate of 20 %. It is interesting to note that the body language for happiness was most often recognized as interest and the body language displayed for fear was recognized equally as both the emotions of fear and boredom. Interest had the highest frequency of incorrect recognitions at 11 % with respect to the true labeled emotion.

Table 2 Confusion matrix for the emotions of the robot

A \(\chi ^{2}\) goodness of fit test with \(\alpha =0.05\) was utilized to determine if the emotions recognized from the observed body language were due to random chance. The results of the \(\chi ^{2}\) test are as follows for each of the emotions:

  • sadness: \(\chi ^{2}\) (df = 7, N = 50) = 237.04, \(p <0.001\);

  • elated joy: \(\chi ^{2}\) (df = 7, N = 50) = 171.44, \(p <0.001\);

  • anger: \(\chi ^{2}\) (df = 7, N = 50) = 186.48, \(p <0.001\);

  • interest: \(\chi ^{2}\) (df = 7, N = 50) = 57.20, \(p <0.001\);

  • fear: \(\chi ^{2}\) (df = 7, N = 50) = 32.24, \(p<0.001\);

  • surprise: \(\chi ^{2}\) (df = 7, N = 50) = 227.44, \(p<0.001\);

  • boredom: \(\chi ^{2}\) (df = 7, N = 50) = 133.68, \(p <0.001\); and

  • happiness: \(\chi ^{2}\) (df = 7, N = 50) = 37.68, \(p<0.001\).

Hence, the emotions for each of the eight displays of body language were chosen significantly above random chance.

It was hypothesized that the emotional body language movements and postures displayed by the robot would be recognized as their corresponding desired emotion more often than the other seven emotions. We utilized a binomial test to exam this hypothesis. Namely, the null hypothesis is that the desired emotion will be recognized at the same or a lower frequency than the other emotions, i.e., \(p_{1} \le 0.5\). The results of the binomial test are presented in Table 3. It can be concluded that with 95 % confidence the desired emotions of sadness, elated joy, anger and surprise are recognized significantly more often than any of the other emotions. A 75 % confidence level was found for the emotion of boredom being recognized significantly more often than the other emotions. However, the emotions of interest, fear and happiness were not recognized significantly more often than the other emotions. Interest was the emotion most often chosen by the participants for the body language corresponding to the desired emotion of happiness.

Table 3 Results of binomial test for the recognized emotions of the robot

5.4.2 Identifying the Emotional Body Language Displayed by the Human Actor

The recognition results for the emotions displayed by the human actor are presented in the confusion matrix in Table 4. As can be seen by the results, anger had the highest recognition rate of 100 % followed by boredom which had a recognition rate of 86 %. Emotions such as fear, surprise, elated joy and interest had recognition rates of 70, 66, 60, and 56 %, respectively. The emotions with the lowest recognition rates were sadness with a rate of 34 % and happiness with a rate of only 2 %. Similar to the recognition rates with respect to the robot, happiness was again considered to be the least recognized emotion from the corresponding body language. Only one participant chose happiness based on its described body language. From the results, it can be seen that the body language for sadness and happiness were more often recognized as boredom. Hence, boredom had the highest frequency of incorrect recognitions across all the emotions at 18.5 %.

Table 4 Confusion matrix for the emotions of the human actor

A \(\chi ^{2}\) goodness of fit test with \(\alpha =0.05\) was implemented to determine if the observed emotions were chosen at a rate higher than random chance for the human actor. The test was applied to all the emotions except anger, as anger had a 100 % recognition rate for the human actor. The results of the \(\chi ^{2}\) test are as follows:

  • sadness: \(\chi ^{2}\) (df = 7, N = 50) = 150.32, \(p< 0.001\);

  • elated joy: \(\chi ^{2}\) (df = 7, N = 50) = 117.68, \(p< 0.001\);

  • interest: \(\chi ^{2}\) (df = 7, N = 50) = 102.96, \(p< 0.001\);

  • fear: \(\chi ^{2}\) (df = 7, N = 50) = 169.84, \(p < 0.001\);

  • surprise: \(\chi ^{2}\) (df = 7, N = 50) = 160.88, \(p< 0.001\);

  • boredom: \(\chi ^{2}\) (df = 7, N = 50) = 251.76, \(p< 0.001\); and

  • happiness: \(\chi ^{2}\) (df = 7, N = 50) = 201.52, \(p < 0.001\).

Hence, the emotions for these seven displays of body language were chosen significantly above random chance.

Similar to the robot emotions, it was hypothesized that the emotional body language features displayed by the actor would be recognized as their corresponding desired emotion more often than the other seven emotions. The results of the binomial test are presented in Table 5. With 95 % confidence the desired emotions of anger, fear, surprise and boredom can be recognized significantly more often than any of the other emotions. Confidence levels of 89 and 75 % were found for the desired emotions of elated joy and interest, respectively. On the other hand, the emotions of sadness and happiness were not recognized significantly more often than the other emotions. In particular, for both the desired emotions of happiness and sadness, the most recognized emotion by the participants, based on the corresponding body language, was boredom.

Table 5 Results of binomial test for the recognized emotions of the human actor

5.4.3 Comparison

Figure 6 presents a direct comparison for the emotion recognition rates for the robot and human actor. From the figure, it can be seen that the recognition rates were higher for the human actor for the emotions of anger, interest, fear and boredom, while the robot had higher recognition rates for the emotions of sadness, elated joy, surprise and happiness.

Fig. 6
figure 6

Comparison of recognition rates for robot and human actor

McNemar’s two-tailed test for paired proportions was used to statistically compare the recognition results from the robot and human actor. The null hypothesis was defined as the difference between the recognition rates, \(p_{1}\) for the human actor and \(p_{2}\) for the robot, should be zero. The first alternative hypothesis was defined as the emotion recognition rates of the body language for the human actor are higher than for the robot, and the second alternative hypothesis was defined as the recognition rates for the robot are higher than for the human actor. The \(2 \times 2\) contingency tables comparing the recognition results of the desired emotions of the robot and actor with respect to the other emotions are presented in Table 6 with the McNemar test results presented in Table 7. Significance testing was conducted using \(\alpha =0.05\). The emotions for which the null hypothesis was accepted were elated joy, interest and surprise. Hence, there was no statistical difference between the recognition rates for the robot and human actor for these emotions. For all other emotions, the null hypothesis was rejected. In particular, statistically, there is a significant difference in the recognition results for the robot and human actor for the five remaining emotions. Namely, the robot has higher recognition rates for sadness and happiness, while the human actor has higher recognition rates for the emotions of anger, fear and boredom.

Table 6 Contingency tables for the recognition results of both the robot and human actor
Table 7 McNemar significance results for the robot and human actor recognition rates

6 Discussions

The recognition results for the human-like social robot showed that participants were able to recognize the emotional body language for sadness, elated joy, anger, surprise and boredom, as defined by Wallbott [31] and de Meijer [32], with rates over 55 %. All these emotions had recognition rates significantly above random chance with respect to all other emotions for the same body language. The body language for the emotion of fear was recognized by the participants both as fear and boredom with the exact same frequency. This can be a difficult emotion for the robot to express based on the defined body movements and postures due to the rigidity of the robot’s body. For example, the rigid body of the robot does not allow it to easily curl in the shoulders and bend the back similar to how a human would for this particular emotion, i.e., Fig. 4. Furthermore, it is difficult for the robot to mimic the tensing of the muscles in the body to represent the force and energy of the high dynamic movements for this particular emotion. This made the recognition of this emotion more challenging for the participants. Furthermore, as the emotional body language for fear required the robot to turn its head away and bow its trunk, some participants confused this as the robot was displaying boredom.

For the robot, the body language for the desired emotion of happiness was recognized more often as interest and boredom. For the actor, the body language for the desired emotion of happiness was recognized most often as boredom. These other emotions contain similar descriptors to happiness such as stretching the trunk and low movement dynamics which could contribute to the confusion. In general, the body language for happiness had low recognition rates for both the robot and human actor, although the recognition rates were significantly higher for the robot than the actor. Unlike the robot, during this body language display, the actor also had his hands in his pockets for approximately half of the duration of the video. Hands in the pockets have been found to be perceived as a number of different affective states including calm and easygoing [65], casual attitude [66], relief [67], and sad [67]. Hence, this particular gesture may have also resulted in the majority of the participants recognizing this body language display as boredom for the actor. The similarity in descriptors can also be the reason why the robot’s emotional body language for interest was recognized as happiness by 32 % of the participants. Hence, alternative body language descriptors may need to be considered and tested for the emotion of happiness. The challenge will be to identify potential descriptors for happiness for the robot that will also be unique from those used for elated joy, where both emotions have positive valence, but the latter has higher arousal. Wallbott [31] is the only researcher to the authors’ knowledge that provides specific human body language descriptors for the emotions of happiness, elated joy, boredom and interest. In our study, we used Wallbott’s descriptors for the first three emotions and descriptors from de Meijer for the emotion of interest. For interest, Wallbott’s body language descriptors are similar to those defined by de Meijer, with the exception that de Meijer also included descriptors that describe the direction and dynamics of body movements for this emotion. The inclusion of other modes such as facial expressions may also need to be considered for happiness. For example, it has been shown in several studies that a universal human facial expression for happiness includes such descriptors as raising the checks and moving the corners of the mouth upwards [68], hence adding such descriptors to the body language for happiness might be necessary in order to increase recognition rates for this particular emotion for the robot.

The recognition results for the actor showed that the participants most often associated the body language for sadness with boredom, however, this was not the case for the robot. For the robot, the body language for the desired emotion of sadness was recognized significantly more often as sadness than any of the other emotions. From the comparison study, it was determined that the desired emotion of sadness was recognized at significantly higher rates for the robot than the actor. This may be a result of the difference in the head positions of the robot versus the actor during the videos. On average, the robot’s head was facing more downwards than the actor’s head while displaying the body movements for sadness, as the robot was not able to slouch its shoulders. Studies by both Darwin [69] and Bull [70] have found that dropping/hanging the head is related to the emotion of sadness. The emotions of interest and surprise had statistically similar recognition results for the robot and actor; this was due to the fact that the robot was able to easily replicate the body movements for these emotions and did so in a similar manner as the actor did. For the emotion of elated joy, due to each shoulder of the robot having one fewer rotational degree of freedom than a human, the robot generated the opening and upwards arm movements by also moving its upper arms outwards compared to the actor who directly lifted the upper arms forwards. Despite this difference, the recognition results were statistically similar to that of the human actor. The emotional body language for anger, boredom and fear were recognized at statistically higher recognition rates for the actor, this can be a result of the robot not being able to directly mimic the tensing of the muscles (for angry and fear) or curling in the shoulders and bending the back (for boredom and fear) as previously mentioned.

As both the robot’s and actor’s faces were visible in the videos, the lack of facial expressions could have influenced the recognition rates for the emotions, even though the participants were informed that only emotional body language without any facial expressions was displayed in both sets of videos. Namely, this might have been a reason why happiness had low recognition rates for both the robot and actor. This could have also been the cause of the confusion for the emotion of fear being recognized as both fear and boredom when the corresponding body language was displayed by the robot. Since the robot’s eyes did not move independently of the head, the robot did not keep eye contact with the camera to the same extent as the actor did for the emotional body language displays of sadness and surprise. For the display of sadness, due to its more downwards head pose, the robot averted its gaze from the camera for 89 % of the video, while the actor averted his gaze from the camera for 55 % of the video. As previously mentioned, this more downwards head pose of the robot and therefore its averted gaze may be a result of why its display of sadness had a higher recognition rate. For the display of surprise, due to the range of motion of the robot’s body, the robot averted its gaze for 95 % of the video, while the actor did not avert his eyes. Despite this difference in eye gaze, the recognition rate for the robot for surprise was statistically similar to the recognition rate for the actor. Although when comparing Figs. 3 and 4, the robot’s body language for fear and happiness appear to have slightly more instances of averted gaze in comparison to the human, the overall amount of time that Brian 2.0 and the actor had averted gazes during their respective videos for these emotions was within 10 % of each other. Previous studies have shown that eye gaze direction does not directly influence recognition of emotions displayed by facial expressions [71] and that they are processed independently [72], however, to the authors’ knowledge, there have been no studies that have investigated the direct influence of eye gaze on the recognition of emotional body language. Therefore, this relationship should be further explored in future work.

The recognition rates for the robot were also compared to the recognition rates that Wallbott obtained in [31] for the same body language descriptors used for happiness, sadness, boredom and elated joy to provide further insight. Unfortunately, a similar comparison could not be conducted with the emotions obtained from de Meijer’s descriptors as recognition rates were not provided in [32]. The recognition rates of the emotions elated joy and boredom of Brian 2.0 were found to be within 10 % of the recognition results that Wallbott observed for these emotions in [31]. Sadness also had high recognition rates for our robot study and Wallbott’s study. In [31], happiness had a good recognition rate, being distinguishable from all the other emotions except for contempt, which is an emotion we did not consider in our robot study.

Overall, the experimental results showed that the body language descriptors were effective in displaying the emotions of sadness, elated joy, anger, surprise and boredom for our social human-like robot Brian 2.0, warranting the potential use of these social emotions and corresponding body language for the robot in natural and social HRI settings. On the other hand, the body language for the emotions of happiness, fear and interest were not well recognized for the robot.

While previous studies have compared human and artificial displays of emotional facial expressions and have shown that the later can also be recognized effectively (though with lower recognition rates than the human) [23, 73, 74], our comparison study is novel in that it focuses on a robot’s display of emotional body language. In general, the work presented in this paper can be used as a reference when determining the emotional body language of other life-sized human-like robots or androids. With respect to android body language, it has been stated that there has been little active research in this area [75].

7 Conclusions

Our research focuses on robotic affective communication as displayed through body language during social HRI scenarios. Namely, in this paper, we investigate the use of emotional body language for our human-like social robot Brian 2.0 utilizing body movement and posture descriptors identified in human emotion research. The body language descriptors we explore for the robot are based on trunk, head and arm movements as well as overall movement quality. Experiments were conducted to determine: (1) if non-expert individuals would be able to identify the eight social emotions of sadness, fear, elated joy, surprise, anger, boredom, interest and happiness from the display of Brian 2.0’s body language which has been derived from a combination of human body language descriptors, and (2) compare how individuals interpret the same emotional body language descriptors displayed by the social robot with fewer degrees of freedom and a human actor, in order to determine if the desired emotions can be communicated by the robot as effectively as by a human. Experimental results showed that participants were able to recognize the robot’s emotional body language for sadness, elated joy, anger, surprise and boredom with high recognition rates. Even though the robot was not able to implement some body movement features due to its rigid body, the participants were still able to recognize the majority of the emotions. When comparing the recognition rates, it was determined that the emotion of sadness was even recognized at significantly higher rates for the robot than the human actor, while the robot and actor had similar recognition rates for elated joy, surprise and interest. Both the robot and actor had the lowest recognition rates for the emotion of happiness, due to its similarity in body movement features to other emotions. Only the emotions of anger, fear and boredom were recognized at a significantly higher rate for the human actor. Overall, these experimental findings demonstrate that certain human-based body movements and postures that can represent social emotions can be effectively displayed by a life-sized human-like robot. Our future work will consist of integrating the robot’s emotional body language with other natural communication modes we have been working on, such as facial expressions and vocal intonation, in order to develop and test a multi-modal emotional communication system for the social robot.