1 Introduction

A challenge for social human–robot interaction is to enable the robots to convey emotions, like the human beings do, in order to facilitate interaction with humans. Human beings convey emotions not only with speech but also with nonverbal behavior including facial expressions, hand and arm gestures, postural expressions, and various movements of the body parts like the legs and feet [1]. Among those, postural expressions are remarkable because they are stationary, mostly accompanying other forms of expressions, and relate to the overall body. This study aims to demonstrate that emotional postures observed with humans can be transformed to a humanoid robot to convey the same kind of emotions. The implementation platform is the Nao robot (Fig. 1). Nao a 58 cm high robot with 25 degrees of freedom. The preliminary results of this paper were presented in a workshop [2].

Fig. 1
figure 1

The Nao robot

There have been studies to implement emotional body postures with various humanoid or other forms of robots. In most of these studies the researchers generate emotional postures based on subjective feelings of their own [3, 4], of the naïve performers and puppeteer artists [5], or of cartoonists and photographers [6]. Then they focus on evaluation of the recognition rates of those intuitively prepared postures. In contrast to those, in Coulson’s [7] study the quantitative descriptions of postures are generated based on the qualitative descriptions from some behavioral science literature such as in the references [810]. In the present work I adapt Coulson’s quantitative descriptions to be used with the Nao robot. Therefore, Coulson’s work and this paper together present a process that starts from qualitative descriptions in behavioral sciences and ends up with an implementation on a humanoid robot.

The process that this study is a part of, for generation of robotic emotional postures, can be described with six steps: (1) observation of actual human behavior, (2) describing the human behavior in qualitative terms, (3) encoding the qualitative descriptions in quantitative terms, (4) testing the quantitative terms using an artificial human model, (5) adaptation of the quantitative terms for the robot, (6) testing the adaptation with the robot. The first two steps were performed in the behavioral science literature. The third and fourth steps are performed in Coulson’s study [7]. In this paper I perform the fifth and sixth steps, taking into account the Nao robot. Such a process is considerable in three aspects. First of all, the human is naturally the starting point of the knowledge about how emotional postures (should) look like. Humans provide a rich and readily available source for that knowledge. Therefore, it might be more effective to directly observe the human behavior rather than relying on impressions and imaginations of how postures look like. Secondly, this process provides a generic repertoire of postures quantitatively described with simplified human body models. This is what Coulson’s study achieves. Lastly, once there is such repertoire, the model can be adapted to various forms of robots by applying proper transformations. The present study demonstrates this third aspect using the Nao robot.

Apart from demonstrating the last two steps of the above described process, this work is motivated by developing emotional postures for the Nao robot to be used in interactive games [11, 12]. In such games, the emotional postures might serve to convey the internal state of the robot to the human partner. For example, the robot might present happy, sad, and angry postures when there is a favorable, unfavorable, and recurring unfavorable situation, respectively, for the side of the robot. The game scenarios might be designed in a way that such postures function as a means of communication and that they ease the understanding of the status of each party in the game. It is expected that such construction of games encourages social interaction with the robot and makes it easier to use the robot for various purposes ranging from rehabilitation [1317], to training [18], and to entertainment [19].

In this study, I implement emotional body postures for anger, happiness, and sadness with the Nao robot, without any facial expression and without any sound. The postures are developed by adaptation from Coulson’s [7] work performed with a human body model. The best five of the postures corresponding to the three emotions are selected by the votes of anonymous participants. The selected postures are evaluated by another group of anonymous participants. The evaluations show that the selected postures are strongly associated with the intended emotions.

2 Related Work

2.1 Approaches of Social Human–Robot Interaction

Monitoring and facilitating social interactions are the two grand challenges of social robotics. As examples of monitoring, Kanda et al. [20] develop and implement a friendship estimation model to understand the friendship relations between children from non-verbal interactions. Breazeal et al. [21] develop an integrated socio-cognitive architecture which provides the robot with the capability of perspective taking in order to correct the false beliefs of the human in collaborative tasks. The robot infers the beliefs and intentions of the human by monitoring the actions and decides on the assistive actions by comparing those with its own belief system. Liu et al. [22] develop a real-time affect detection through physiological signals in order to adapt the difficulty level of a basketball game to the liking of the children with autism. As a good example of facilitating social interaction, Kozima et al. [19, 23] develop the Keepon robot with emotional expressiveness for therapy and entertainment purposes. The authors state that an appropriately designed robot can facilitate not only dyadic interaction between the child and the robot, but also triadic interaction between the child, caregiver, and robot, where the robot functions as an interpersonal pivot. The present study targets at facilitating social human–robot interactions with emotional postures. These postures are intended to be a part of the overall interaction by means of interactive games.

2.2 Artificial Emotions

Fong et al. [24] state that artificial emotions are used in social robots because they facilitate believable human–robot interaction, they provide feedback to the user about the robot’s internal state, goals, and intentions, and they can act as a control mechanism for driving the behavior of the robot under different environmental conditions. As an example of the latter, Arkin et al. [25] develop the emotionally grounded architecture, in which the external stimuli and internal drives of the robot interact to choose the actions under given conditions. A lot of other studies aim at facilitating believable human–robot interaction and providing feedback about the robot’s internal state by generating emotional facial expressions. The emotions anger, disgust, fear, happiness, sadness, and surprise have become to be known as the six basic emotions after the work by Ekman and Friesen [26]. Several researchers study the expression of these basic emotions and others with a real robot. The Kismet robot is the outcome of one of the remarkable studies on facial expression of emotions [27]. The robot can facially express various emotions with moving its eyes, eye browns, lips, and other parts of the facial construct. In another remarkable work on facial expressions Endo et al. [28] develop the robot KOBIAN and implement and evaluate 16 facial expressions for emotional content.

Compared to facial expressions, body postures are less studied in humanoid robot applications. The WE4-RII robot presented in [29] is capable of facial and bodily expression of emotions. Zecca et al. [4, 6], implement emotional body postures to accompany the facial expressions of the robot KOBIAN. Nomura and Nakao [3] implement emotional body movements for anger, sadness and pleasure on the small-sized humanoid robot Robovie-X. In the present study the aim is to implement 32 body postures on the Nao robot for each of the three emotions anger, happiness, and sadness and to select the best five for each emotion.

2.3 Postures with Human Body Models

There has been a variety of study on the emotional content of postures with human body models. In most of these studies the authors construct the human postures in virtual environment by using simplified models of human body [7, 3034]. In these studies various postures are simulated and shown to observers on a computer screen. The feedback of the observers is used to verify the emotional content of the body postures. Clavel et al. [33] perform a comparative and relational study of facial and postural expressions of five emotions (anger, joy, sadness, surprise, fear) using a virtual character on a computer screen. In their study the three emotions, anger, sadness, and joy (happiness), are the most successfully recognized. Among the studies making use of virtual environment, the work of Coulson [7] is remarkable because it provides not only the visual content but also the anatomical features of the postures in the form of quantitative joint angle values of the human body model. In the present study Coulson’s work is used as a basis to generate emotional postures for a humanoid robot. Coulson’s data is used also in [34] to replicate his experiments on a different virtual human body model.

2.4 Recognition of Emotions

In social interactions emotions are conveyed through facial, vocal and postural expressions. Crane and Gross [35] perform experiments of walking and examine the gait parameters of the walk in different emotional moods. They show that emotions can be recognized in the body movements and that body movements are affected by the emotions of the person. The comparative study of facial and postural expressions of emotions by Clavel et al. [33] demonstrates that the emotions are better recognized when they are conveyed by facial expressions than by postural expressions. Zecca et al. [6] implement facial and bodily postures with a humanoid robot. The authors state that body postures alone are most of the times not sufficient for a correct recognition. However, they show that when body postures are used with facial expressions they improve the recognition rate by an increment of 25.4 % [6] and 33.5 % [4] in comparison to pure facial expressions.

Li and Chignell [5] examine the generation and perception of emotional gestures with a teddy bear robot capable of only simple head and arm movements. They let the participants create postures with the teddy bear for the given emotions and ask other participants to judge the created postures. In the first part of their study, the rate of correct recognition of the ten emotions was only better than by chance with a score of 22 % success. The second part of their study shows that there are strong correlations between anger and disgust, fear and disgust, fear and sadness, and happiness and surprise. The high correlation indicates that the participants are confused in between these emotions. It is noted that the meaning of a gesture depends on its social and environmental context and that recognition of emotion improves when situational context is provided. The work of Li and Chignell [5] demonstrates that the knowledge of context improves the successful recognition of emotional postures of the teddy bear robot from an average of 15.2 % to an average of 26.7 %. It is also pointed out that emotional expression, whether facial, vocal or postural, depends on the culture [5].

2.5 Coulson’s Study [7]

In Coulson’s paper the six emotions of Ekman and Friesen [26] are studied. He generates human body postures for these six basic emotions: 32 postures for each of anger, disgust, happiness, and sadness and 24 postures for each of fear and surprise. Coulson transforms the qualitative descriptions into quantitative data of joint angles for a mannequin model of human body (Fig. 2). This model consists of thirteen segments. The upper body consists of a head/neck, chest, abdomen, two shoulders, and two forearms; the lower body consists of two thighs, two shins, and two feet. The study assumes symmetry of the arms in all postures and relates the lower body joints (thighs, shins, and feet) to a single variable of movement of the mass center taking one of the values of forwards, backwards, and neutral. In this way the postures are characterized by seven parameters: a parameter specifying the movement of the mass center and six parameters specifying the joint angles for head bend, chest bend, abdomen twist, shoulder adduct, shoulder swing, and elbow bend.

Fig. 2
figure 2

Most successful postures for anger, happiness, and sadness by Coulson, viewed from three different sides on the mannequin figure. (Adapted from [7])

In Coulson’s study the emotional content of the postures is evaluated by external observers. Among the six basic emotions, sadness, happiness, and anger are recognized with the highest consensus levels. The most successful postures for these three emotions are given in Fig. 2; all these there postures are correctly recognized for the intended emotions with a consensus level of more than 90 %. The emotion disgust is not attributed to any posture with more than 50 % consensus level. Fear and surprise are somewhat in the middle: two postures reach a maximum consensus level of 60 % for fear and one posture reaches the maximum consensus level of 70 % for surprise. In short, the postures for sadness, happiness, and anger are the most successful to convey the intended emotion. The better recognition of these three emotions in Coulson’s work is in agreement with the results of other research [33]. Therefore, in this study, also like in [3], I use the postures for these three emotions and test whether they convey the same emotional content when adapted to the Nao robot.

3 Adaptation of Emotional Postures from Human Model to the Nao Robot

The research question addressed in this paper is whether the emotional postures associated with human body convey the same kind of emotions when applied on the small humanoid Nao robot. To this end the emotional human postures developed and tested by Coulson [7] in computer environment are adapted to the Nao. The need of adaptation results from the differences of the robot from the human body.

In the following, the joints of the model by Coulson are related to the joints of the Nao. Figure 3(a) shows the joints of the Nao robot. Figure 3(b) gives the range of rotation of each joint. Figure 4 shows the reference frames attached to the links. The data of Coulson is transformed to fit to the joint configuration of Nao by performing the following five operations:

  1. (1)

    The chest bend values of 20 and 40 degrees in Coulson’s model are reduced to 10 and 20 degrees, respectively. This is because the robot could not stand stably in many postures when the chest bend was more than 20 degrees; the head was too heavy that the robot fell on its face.

  2. (2)

    The abdomen twist in Coulson’s model is implemented as head yaw, to the right and left. This is because there is no abdomen joint in the Nao robot. The impact of twisting to the right or left is generated by turning the head.

  3. (3)

    The maximum positive value of head bend (to the front) in Coulson’s model is reduced from 50 to 30 degrees; because 30 is the maximum range for the Nao robot.

  4. (4)

    The maximum positive value of elbow bend in Coulson’s model is reduced from 110 to 90 degrees; because 90 is the maximum range for the Nao robot. In some cases it is even reduced to 85 degrees, because otherwise the hand touches the head.

  5. (5)

    The forward and backward leaning of the body is generated by visual inspection and adapting proper values for the leg joints. This is because there is no data for the leg joints in Coulson’s work.

Fig. 3
figure 3

(a) The Nao robot and the joint angles that constitute 25 degrees of freedom (RHipYawPitch and LHipYawPitch are coupled) (b) The kinematic range of the joint angles. (Adapted from the documentation of the Nao provided by the manufacturing company)

Fig. 4
figure 4

The reference frames attached to the links of the Nao robot. The main reference is the frame attached to the waist (torso). The frames designated by {0}, {1}, {2}, {3}, and {4} are parallel to this main reference and attached to the torso at the junctions between the torso and the extension (head, left arm, left leg, right arm, and right leg, respectively). The frame designated by {i}{j} corresponds to the jth junction of the ith extension. (Adapted from the documentation of the Nao provided by the manufacturing company)

In total 32 postures are adapted to the Nao robot for each of the three emotions of anger, happiness, and sadness. In order to provide an example of the transformations, I give the data from Coulson and the generated data for the Nao robot for the postures of happiness. The data for the 32 postures of happiness by Coulson are given in Table 1. The equations used for transformation to the Nao robot and resulting joint angle values are given in Table 2. The modifications of the ChestBend and HeadBend prior to the application of the transformation equations are shown in the upper part of Table 1. Among those only the modification of the ChestBend is active for the postures of happiness. The forward and backward leaning postures are implemented by using some incremental angles for the lower body joints of Nao. These incremental values are given in Table 3.

Table 1 Data for the postures of happiness by Coulson [7]. All the values are in degrees
Table 2 Nao joint angle values for postures of happiness after transformation. All values are in degrees
Table 3 Incremental values for the lower body joints of Nao for forward and backward leaning postures. All values are in degrees

The transformation equations and the resulting values in Table 2 are specific to the kinematics configuration of the Nao robot. Nao is currently a widely used robot in social robotics. Therefore, I believe that this transformation and the resulting values are of interest to many researchers. In addition, the transformation process from a model of human body to an actual robotic system provides many clues for other robot platforms. In the following I give my reflections for such transformation.

(1) Kinematic Constraints

A humanoid robot usually has less degrees of freedom and smaller range of joint angles than a human body. Therefore, the description of an emotional posture constructed for human body should be based on the main joints that give the basic shape to the posture. As a good practice, Coulson’s work simplifies the human body model and reduces the number of parameters. My implementation necessitated further simplification, such as using the HeadYaw joint of Nao for the abdomen twist.

(2) Inertial Constraints, Mass Distribution

The mass distribution of the Nao robot is not identical to that of a human body. Especially the head of Nao is quite heavy. The ratio of the mass of the head of an average male human (81.5 kg) over his body mass is around 0.065 [36]. The same ratio for the Nao robot is 0.111. Because of its relatively heavier head, when the body of Nao leans forward or backward, the robot cannot stand stably. In this study the leaning of the body is reduced by limiting the chest bending as observed in Table 1. There are inertial impacts also when the robot moves while passing from one posture to another. For example, when the robot passes from a forward leaning posture to a backward leaning posture, the large mass of the head creates a large inertial force. This force sometimes results in falling of the robot, although both of the discrete postures are stable. This is because the static stability margin for the robot is much less than that of a human, and any slight inertial affect might disturb the stable standing even if the movement is slow. In brief, the mass distribution of the robot might constrain the transformation of postures from human body to a humanoid robot.

(3) Reference Framing

The reference framing by Coulson is different than that of the Careograph programming environment customized for the Nao robot. Therefore the joint angles mentioned by Coulson are adapted by taking their negative values or by adding or subtracting 90 degrees, whenever it applies. These adaptations for the happiness postures are revealed in the transformation equations in Table 2. For such transformations the qualitative descriptions, such as “shoulder swing” and “abdomen twist” are helpful in order to understand the sense of the quantitative values.

(4) Size and Position of the Robot

The size and placement of the robot with respect to the observer is important considering the emotional postures. The average human adult is about 1.6–1.8 m high. The height of the Nao robot is only 58 cm. The emotional postures by Coulson are generated for the size of an average adult human. In some of the postures of anger, the body is leaned forward, the head is bent down and the arms are lifted: a pose of an angry human looking ahead. However, when the same configuration is implemented on the Nao robot standing on the ground, the robot does not seem like looking ahead but like looking down as if it is searching for something on the ground. This is because the face of the robot is quite below the level of the eyes of the observer. With such a posture the face of the robot is not visible to the observer as much as a human face would be. As a result, the intended emotion is not conveyed to the observer. In this study, it is assumed that the head of the robot is at the same level as the observer. Therefore, the participants in this study perform their evaluation by watching the videos or looking at the pictures in both of which the robot is directly facing them.

4 Selection and Evaluation of the Emotional Postures

This section presents the selection and evaluation procedure for the implemented emotional postures. It should be expected that only some of the implemented postures are successful to convey the intended emotions with a significant recognition rate. The aim of this section is first to determine the most successful five postures for each of the three emotions and then to evaluate the success rate of the selected postures to convey the intended emotions. The overall process follows three steps. The first two steps are respectively about a preliminary evaluation of the overall postures and selection of the best five for each emotion. The third step is about evaluation of the selected best five postures for each of the three emotions.

(Step 1) Preliminary Evaluation

The aim of the first step is to have an initial evaluation of the overall postures for each emotion. This step also serves as a means for making the participant familiar with the videos for the selection procedure in Step 2. Three videos are prepared for each of the three emotions anger, happiness, and sadness, respectively. In each video Nao passes through 32 postures. The overall video lasts approximately 2 minutes 22 seconds. These three videos are available in request from the author.

It should be noted that the videos used in Step 1 and Step 2 do not show any mode of action in any context. These videos show just sequential postures of the Nao robot. Each posture is shown for 3 seconds and the transition from one posture to another lasts 1 second. The robot moves only in these transition phases while switching from one posture to another. Otherwise it is stationary for 3 seconds for each posture. Therefore the videos can be regarded as sequences of pictures of the robot shown on a computer monitor.

In the first step of the experiments there were 25 participants (5 women, 20 men). They were recruited among the PhD and Master students at Institut des Systèmes Intelligents et de Robotique of Université Pierre et Marie Curie (ISIR-UPMC) and at École Nationale Supérieure de Techniques Avancées (ENSTA). Most of them studied mechanical and computer science engineering and they had some familiarity with robotic systems, ranging from medical robotics to mobile robots and to robot vision. The participants watched the three videos showing the 32 postures implemented for each emotion, respectively. They watched either on their own computer stations or on my desktop in the laboratory. They were left free to stop or go back and forth in the video, making sure that they saw all the postures. After watching each video they chose one of the six basic emotions as the one most representative for each video. The overall process lasted around 10 minutes per participant.

The results of the preliminary evaluation are given in Table 4. The postures of anger were associated in 36 % with sadness, 32 % with anger, 16 % with surprise, 8 % with fear, 4 % with disgust and 4 % with happiness; the postures of happiness were associated in 76 % with happiness, 16 % with surprise, 4 % with anger, and 4 % with disgust; the postures of sadness were associated in 36 % with fear, 20 % with sadness, 20 % with surprise, 16 % with anger, and 8 % with disgust. The maximum rates are in bold in Table 4. The average of correct recognition rates for the postures is 42.7 %.

Table 4 Results of the Preliminary Evaluation (Step 1). The numbers indicate the percentages of votes of the 25 participants for each of the three videos of implemented emotions. The maximum votes are in bold. The significance parameters Chi-square and p are shown for the overall vote distribution for each video. The critical value of chi-square for five degrees of freedom df=5 with the significance level p=0.01 is 15.086

On the right hand side of the table the result of the significance analysis with the chi-square test of goodness of fit are given. In this analysis the distribution of the votes for each emotional posture are compared with the case of equal distribution among the emotions. In Table 5, I give the observed number of votes, the hypothetical case of equal distribution of votes, and the residuals (the difference between the two for each category) for the case of anger. For the other emotions similar tables are used. In Table 5 the rows for “observed votes” and “equal distribution” constitute the cross table for chi-square test. The critical value of chi-square for five degrees of freedom df=5 with the significance level p=0.01 is 15.086. When the chi-square value is less than this cutoff, it means that the probability of observing an equal distribution of votes is larger than the significance level 1 %; hence the distribution is not significant to the level 0.01. Otherwise the probability of observing the equal distribution is less than 1 % and the distribution is significant to the level 0.01; in other words, at least one of the emotions is significantly attributed to the video. The most significantly attributed emotion can be found looking at the residuals plotted in Fig. 5 for the three videos. Among the three videos only the one with happiness postures has a significant distribution (p<0.0001) of votes and happiness is the emotion most attributed to the video. The videos for anger (p=0.01) and sadness (p=0.0468) fail to pass the indicated significance level; moreover, these videos are mostly attributed to the false emotions.

Fig. 5
figure 5

Residuals of the distribution of the votes of the 25 participants with respect to the hypothetical case of homogeneous distribution across all emotions

Table 5 Sample table (for anger videos) used for the Chi-square test and residual analysis

The results in Table 4 indicate that when the postures for each emotion are considered together as a group, they are not successful to convey the intended emotion. It is only with the set of postures of happiness that the intended emotion is the one most often perceived by the participants.

It should be noted that not all of the implemented postures are expected to convey the intended emotion. Coulson [7] states that despite all joint rotations describing the postures are realistic in degree, some postures look rather unusual. We can expect that the success rate of the postures further decreases with transformation to the Nao robot. Therefore there is an obvious need of selection of the best postures among the overall set.

(Step 2) Selection of Postures in Video

In the second step of experiments the aim is to choose the best five postures among the 32 for each of the three emotions. The same participants of the first step voted for the postures they saw in the videos. The participants were informed about the intended emotion of each of the three videos before they started the second step. They were asked to stop the video when they felt that the posture shown at that moment was “strongly conveying” the intended emotion. The postures at which the participants stopped the video were identified by the video time indicated by the video player. The participants were asked to record this video time on a paper for each posture they stopped the video. They were left free to go back and forth in the video ensuring that they considered all the postures for voting. The overall experiment of this step with the three videos lasted approximately 20–30 minutes per participant. The 25 participants made in total 135 voting for the postures of anger, 178 for happiness, and 100 for sadness. This means that in average a participant voted for 5.5 of the 32 postures for each emotion. In Fig. 6, I give the distribution of the votes of the participants among the 32 postures of each emotion.

Fig. 6
figure 6

The distribution of the votes of the 25 participants for the best postures expressing the intended emotion for anger, happiness, and sadness among the 32 implemented postures for each

The distribution of the votes for each emotion in Fig. 6 reveals that there are significant differences in recognition rates of postures in the same set. While some postures were highly voted, some others did not get even a single vote. The distributions show that the selection procedure was effective to distinguish between the successful and unsuccessful postures. In the following I highlight this observation by indicating (1) the number of postures that received at least one vote; (2) the number of votes for the most voted posture; (3) the number of postures that received more than half of the votes for the most voted; (4) and the postures that are selected as the best five.

For the postures of anger, all postures except for two received votes from the participants. The maximum number of votes was eight. There were 17 postures that received more than four votes (the half of the maximum). The three postures (anger_04, anger_12, and anger_32) received the maximum number of eight votes each. These are the first three selected postures. After these, three postures (anger_24, anger_28, and anger_30) received seven votes each. Among these three, the posture anger_24 is eliminated, because it is very close to the anger_32: the only difference is that the head is bent down in the latter. At the end, the postures anger_04, anger_12, anger_32, anger_28, and anger_30 were selected as the best five representing the emotion anger. They are shown in Fig. 7. The indexing of the postures follows the sequence shown in the videos.

Fig. 7
figure 7

Five selected postures for anger

Among the postures of happiness all postures except two received votes. The maximum number of votes was sixteen; there were 11 postures that received more than eight votes. The most voted five postures were happiness_13, happiness_1, happiness_5, happiness_29, and happiness_21 with 16, 15, 13, 13, and 12 votes respectively. These were selected as the best five postures representing happiness. They are shown in Fig. 8.

Fig. 8
figure 8

Five selected postures for happiness

For the postures of sadness only 19 of the 32 postures received votes. Especially the postures in the second half of the video received very few votes. In these postures the robot performed backward leaning (Weight transfer: Backwards). The participants stated that it looked more like disgust than sadness in these parts. The maximum number of votes was 14 and only six postures received votes more than seven. The most voted five were sadness_04, sadness_16, sadness_03, sadness_07, and sadness_08 with 14, 11, 9, 9, and 9 votes respectively. These were selected as the best postures representing sadness. They are shown in Fig. 9.

Fig. 9
figure 9

Five selected postures for sadness

The most successful posture of anger by Coulson in Fig. 2, corresponds to the posture anger_18. This posture received six votes, just one less than the least voted posture among the best five. One can observe that the best posture of anger by Coulson is very close to the posture anger_30 in Fig. 7. The most successful posture for happiness by Coulson corresponds to the posture happiness_15. This posture received only five votes from the participants. However, this posture is very close to the mostly voted posture happiness_13 in Fig. 8. The only difference is that the shoulder swing angle in the latter is −90 degrees instead of −45 (more raised up). In the case of sadness, the most successful posture by Coulson corresponds to the posture sadness_26. This posture did not receive any votes from the participants. This is because the backwards leaning of the body in this posture was mostly perceived as an indication of disgust. In brief, none of the most successful postures of Coulson are among the best five postures implemented on Nao; the most successful postures of Coulson for anger and happiness are very close to some of the postures among the best five; and lastly the most successful posture of sadness by Coulson is unsuccessful to convey sadness when implemented on Nao.

Coulson performed statistical analysis of how the anatomical variables determined the attribution of emotion. Based on this analysis he provides the description of successful postures for the emotions. Anger: arms raised forwards and upwards; happiness: head backwards, arms are raised above shoulder level and straight at the elbow; sadness: forwards head bend, forwards chest bend, and arms at the side of the trunk. These characteristics are observed also on the best five postures for the three emotions presented in Figs. 79.

Using similar descriptions with Coulson, we can come up with some clues of how to implement the selected emotional postures for other humanoid robots. These descriptions are based on the visual inspection of the set of postures in Figs. 79. The postures in Fig. 7 show that the emotion anger is conveyed mostly by raising the arms in front of the body with the elbows being sharply bent and the body slightly leaned forwards. The position of the head might be either looking straight to the observer or to the ground. The happiness postures in Fig. 8 are characterized by the arms raised above the shoulder with a straight elbow. The head is either looking straight or upwards, and the body might be either neutral or leaning forwards. The postures of sadness in Fig. 9 are characterized by the face looking down on the ground and the arms swaying on the two sides; the body is in neutral position without any forward leaning.

In the following, the selected postures are evaluated for whether they successfully convey the intended emotion.

(Step 3) Evaluation of the Selected Postures

The evaluation of the selected postures is based on a brief questionnaire in which 40 participants (14 women, 26 men) took part. The participants were recruited among the Bachelor and Master students taking a French course at UPMC and among the Master students performing their studies at UPMC-ISIR. The majority of these participants did not have experience with robotics; they were students in different fields (law, psychology, medicine, mathematics, engineering, etc.). The participants of Step 1 and 2 did not take part in Step 3.

With the selection of the previous step, the number of postures to be used in evaluation was reduced to 15 out of the 96. It was possible to print these 15 postures on a single sheet of paper. Therefore, in Step 3, printed papers are used instead of video sequences. In this way it was possible for the participants to view all five postures related to a single emotion side by side and think about a single emotion that describes all the five. Moreover, using a single paper sheet of questionnaire, rather than a video sequence, facilitated reaching more number of participants. It should be noted that the pictures shown in Step 3 were nothing else than the printed versions of the postures as they exactly appeared in the videos. They were generated by copy and paste from the video images. Therefore, the difference in the mode of stimuli can be assumed to be only due to seeing the same pictures on the monitor or on paper. This difference is assumed not to be significant for the selection and evaluation purposes of this study. Using videos for Step 1 and 2 (viewing 96 postures sequentially) and a single sheet of paper in Step 3 (viewing 15 postures at once) provided obvious practical advantages.

In Step 3, the participants were provided with a single sheet of paper on which each set of five postures corresponding to the three emotions were printed in color. Beneath the five postures the six basic emotions were written. For each set of pictures, the participants were asked to choose one of the six emotions that best corresponded to the five pictures. The examination of the pictures and deciding on the emotions lasted around 1–2 minutes per participant. The overall results are given in Table 6. In Table 7, I again give the observed number of votes, the hypothetical case of equal distribution of votes, and the residuals, as a sample, for the case of anger. Again the rows for “observed votes” and “equal distribution” constitute the cross table for the chi-square test. Similar tables are used for the other emotions.

Table 6 The percentages of the votes of 40 participants considering the six basic emotions for the best five postures corresponding to the emotions anger, happiness, and sadness. The maximum votes are in bold. The significance parameters Chi-square and p are shown for the overall vote distribution for each group of postures. The critical value of chi-square for five degrees of freedom df=5 with the significance level p=0.01 is 15.086
Table 7 Sample table (for anger videos) used for the Chi-square test and residual analysis

The parameters of significance on the right hand side of Table 6 indicate that the distribution of the votes for all set of pictures were significant (p<0.0001). This means that there was a significant association of the set of the pictures with at least one emotion. In Fig. 10, I plot the residuals for the postures for each emotion. In each case the intended emotion received the largest rate of the votes (the residual is the highest). Anger was recognized with a success rate of 45 %, happiness with 72.5 %, and sadness with 62.5 %. The average recognition rates for these three posture is 60 %. This is a significant improvement in comparison to the preliminary evaluation results.

Fig. 10
figure 10

Residuals of the distribution of the votes of the 40 participants with respect to the hypothetical case of homogeneous distribution across all emotions

4.1 Comparison of the Results with Those of Other Studies

The difficulty for recognition of emotions from postures rises mostly because such recognition is in fact context dependent. It is mostly the context that allows for only a limited number of meanings and excludes many of the emotions. For example, a question such as “do you like the taste or not?” might relate to happiness and disgust, and exclude sadness, surprise, fear, and anger. Being subject to such a question, one can more easily choose one of the first two emotions for a given posture without even considering the rest. In this study and in fact in many other similar studies such context information is excluded in the evaluations (Li and Chignell [31] is one of the exceptions in this regard).

In the evaluation of this study, there was no open or neutral choice for the answers of the participants. The participants were forced to choose among one of the six basic emotions for each video and for each set of pictures. The decision to restrict the answers to the six emotions was to partially compensate for the absence of a context in the evaluation. As mentioned above, it was considered that the context information restricts the possible emotions to be perceived from a posture to a specific set. In this study, in the absence of a context, the participants were restricted to the set of the six basic emotions without a choice for an open answer. The aim in this study was to see what the participants would choose when they were forced to choose one among the six emotions.

In the study by Zecca et al. [6], the average recognition rates for the emotional robot body postures prepared by students is indicated to be 70 % while those by the cartoonists and photographers is 70.5 %. The average recognition rates for the postures prepared by the three groups is 72.3 % for anger, 47.8 % for happiness, and 77.4 % for sadness. The average recognition rate is 65.8, slightly larger than the recognition rates in the present study. This can perhaps be attributed to the fact that the KOBIAN robot they used had 65 degrees of freedom, with the capabilities of having hand gestures, moving the waist, having more degrees of freedom in the arms. Therefore, the KOBIAN robot allowed for much more expressive body postures.

In the study by Li and Chignell [5] with a teddy bear robot, some context information was used. The success rate of recognition of the emotions was 22 %, being 2.2 times more than the pure chance of 10 % for the ten emotions in their case, including the option of neutral. In my study the pure chance, in other words an equal distribution of votes among the emotions, would correspond to a rate of 17 %. The average recognition rate in my case is 60 %, which is more than 3.5 times of that of the pure chance. The reason why the recognition rate in my study was higher than that of Li and Chignel [5] can be attributed again to the type of the robot used. In their work the robot was a simple teddy bear, which was capable of head and arm movements only. In my case the Nao robot had enough degrees of freedom which allowed for richer configurations for the postures.

In the study of Nomura and Nakao [3], movements were implemented on a small humanoid robot for the same three emotions as in the present paper. These movements were tested with the university students and elderly people. The recognition rates with the university students were higher compared to the elderly people. My participants were among the university students. The results in [3] indicate that the recognition rates of the present study might decrease when the subjects are chosen among a different group than the university students. The participants of [3] had the option of multiple choice among the six emotions plus a free category of “others”. In contrast to that, in the present study the participants made a single choice among the six emotions. Therefore the recognition rates in [3] are not directly comparable to those of the present study. For completeness, I note that, in [3], the intended emotions for the movements of anger and sadness were correctly indicated by all students (100 %), pleasure (happiness) was correctly indicated by 94 % of the students. When we consider that the participants indicated other emotions besides the correct ones, the rate of the correct indications over all is 33 % for anger, 48 % for pleasure, and 43 % for sadness movements.

The residuals in Fig. 10 show that in this study the expression of happiness was confused mostly with surprise. This is in agreement with the results of Coulson [7], Nomura and Nakao [3], and Li and Chignell [5]. In Coulson’s work happiness and surprise were systematically confused with each other.

In the results here, sadness was confused mostly with fear. This is in agreement with the results of Li and Chignell. In the results of Nomura and Nakao sadness was mostly confused with hate (disgust). In Coulson’s study sadness, anger, fear, and disgust were not confused with each other.

In this work anger was confused mostly with fear. In the results of Li and Chignell anger was confused mostly with disgust and then with fear. Similarly, in the results of Nomura and Nakao sadness was mostly confused with hate (disgust).

5 Conclusion

In this study I aim at generating emotional postures for the humanoid robot Nao for anger, sadness, and happiness. The approach is based on adapting the postures developed and tested by Coulson [7] on a human body model in virtual environment. The joint angle values of the human body model of Coulson are adapted to the Nao robot considering the kinematic, mass distribution, and framing constraints. The postures are implemented with the robot in a way that the robot sequentially passes from one to another. A video is recorded for each emotion while the robot is passing through the postures.

In the preliminary evaluation the participants watched the overall videos, and they were very much confused about the emotional content of the postures. This was because many of the postures were not successful to convey the intended emotion. There was a need for selection of the most successful ones. For this purpose, in a following step, five of the 32 postures for each emotion were selected as the best to convey the associated emotion. The selection was based on the votes of the participants who indicated the best postures on the videos.

In the evaluation phase, the participants were asked to associate each set of five postures with one of the six basic emotions (anger, disgust, fear, happiness, sadness, surprise). The distribution of the votes reveals that the participants were much less confused in comparison to the preliminary evaluation. The intended emotions received the largest rate of votes in each of the three cases with a higher significance level than that corresponds to the threshold p=0.01. This result reveals that the selection process significantly eliminated the negative impact of the unsuccessful postures that caused confusion of the participants in the preliminary evaluation. However, there were still confusions mostly between the emotions anger/fear, happiness/surprise, and sadness/fear.

The emotional postures in this study were shown to the participants in isolation from any context. It is well known that the context information improves the recognition rate. We can expect that the postures developed in this study will be more successful when used in the contextual framework of the envisaged game scenarios.

The participants for the selection and evaluation processes were all among the university students. The results of [3] indicate differences across university students and elderly people in recognition of emotions with robot body motions. Based on these results, the emotional content of the selected five best postures in the present study should not be generalized to all age groups. The recognition of emotions from the robot postures by children, for example, might differ from that of the university students. This indicates that the emotional content of the selected postures should be verified for the intended age group of people before being used. If the success of the selected postures cannot be verified for another age group, it might be necessary to repeat the selection procedure with a representative group from that age range.

The correct recognition rates in this study are comparable to those of similar studies. They are slightly higher than some of those. This is partly because in the present evaluation there was no open choice for the answers, and partly because the number of degrees of freedom of the Nao robot was higher than the robots used in the others. On the other hand, the recognition rates in this study are slightly lower than another study in which the robot had more degrees of freedom and richer expressive capability. These observations indicate that the expressiveness of the robot and the options provided for the answers in evaluation might impact the recognition rates of emotional postures.

This study shows that the Nao robot can convey the intended emotion with a considerable recognition rate across different perceivers. However, the quantitative descriptions of the postures developed for the Nao robot cannot be directly used for another humanoid robot. This is because every humanoid robot has a different kinematic configuration. On the other hand, the approach and experience revealed in this study provides a guide to perform the same kind of adaptation for any other humanoid robot. The qualitative descriptions of the postures can be used to generate similarly looking postures with other robots.

Most importantly, this study can be considered to be the last step of a general process of developing emotional postures for robots. This process starts with qualitative descriptions of human behavior (performed in behavioral science literature), continues with encoding those qualitative descriptions in quantitative terms for a simplified human model (performed by Coulson [7]), and ends with adapting the quantitative values for the human model to the case of a specific robot (performed in this study). This process is based on the idea that emotional behaviors and postures as observed in human can be applied on robots with some modifications specific to the robot used.