Keywords

1 Introduction

The global population of people aged over 60 years old will be more than 2 billion by 2050 according to the World Health Organization [1]. Social robots may play a significant role in interaction and companionship with aged and disabled people in the future. In HRI, the more vividly and naturally robots behave, the more comfortable and pleased humans feel. Enabling the robots to express their emotions naturally is an effective way to improve the lifelikeness of robots and the quality of interaction. There is no doubt that speaking is a direct way of emotion expressions whereas humans and robots can express complex or delicate emotions indirectly by nonverbal manners including facial expressions [2, 3] and body language [4,5,6].

Facial Action Coding System (FACS) is an universal standard for categorizing the facial emotion expression. It was firstly developed by a Swedish anatomist Hjortsj\(\ddot{o}\) [7] and adopted and published by Ekman in 1977 [8]. Based on computer science, FACS has been made into a system that can track face, extract face features and produce temporal files automatically to overcome the original limitation of time costs and extensive training requirements [9].

Some robots that have the appearance of the human are created to be anthropomorphic and support facial expressions. In [10], a robot called Albert HUBO was designed by combining the Einstein-like head and the body modified with HUBO robot. The head had smooth skin and numerous motors to support a large range of facial expressions. Hiroshi Ishiguro et al. built a humanoid robot Geminoid F who had delicate artificial skin and hair which was hard to distinguish from human [11]. Its rich facial expressions also make communication more attractive. However, according to uncanny valley theory [12], familiarity will suddenly drop when the human likeness of robot increases to a certain extent so that we cannot simply consider that the quality of HRI will improve as the robot is more human-like.

What is more, most humanoid robots at present only have fixed faces so that facial expressions are not appropriate for them to convey emotions. Therefore, body language may be a better approach for them to express emotions. Also, humanoid robots usually have numerous degrees of freedom which provides them with the advantage of displaying various complicated postures.

Although there is still no research pointing out which posture stands for which specific emotion, it has been proved that body language is able to convey emotions. Darwin and Prodger [13] figured out a functional link between postural responses and emotions through the experiments by Electromyography (EMG). The researchers found differences in four muscles when the participants were in anger or fear, suggesting that when emotions expressed, the corresponding muscles would have some reactions.

In [14], experiments were designed to compare people’s identification for body language performed by an actor and artificial agents. A professional actor was asked to perform 10 emotions with body language. The motions were recorded by a human motion capture device and then regenerated by animation. The face and hands of the human and the animation were removed to avoid uncanny valley effect and volunteers were asked to choose an emotion from Geneva Emotion Wheel for the posture display. They concluded that artificial agents can express emotions by body language and discussed several facts that could affect the expression. Besides, they also investigated the effect of the head position on the emotion expression using the Nao robot.

In this paper, two surveys were conducted to investigate whether Nao is able to convey emotions by static postures. We established emotional posture library containing six basic emotions through Internet searching and image processing. These postures were regenerated by the Nao robot. Volunteers were asked to recognize the emotion expressed by the body language of the human and the Nao robot in two questionnaires. Two facts that may affect the emotion expressions of the robot, which are the ambiguity of the postures and the joint limits of the robot, were figured out by analyzing the statistical result. They were finally verified by the method of hypothesis testing conducted in SPSS, a software used for statistical analysis.

2 Surveys

First of all, a set of emotional postures was established and six basic emotions [15], including anger, disgust, fear, happiness, sadness and surprise, were selected. We searched pictures of these six basic emotions through Google and tried to find non-repeated postures with evident differences to avoid confusion. 42 postures were finally obtained including 10 anger, 8 disgust, 6 fear, 7 happiness, 5 sadness and 6 surprise.

Since what we care about is body language, in the first survey, the face in each picture was removed to eliminate the influence of facial expressions. Then the processed pictures were placed out of sequence in the questionnaire and there were 8 options for participants to evaluate the emotion that a posture expressed, including six emotions mentioned above, “neutral”, and “other”. Participants were asked to make a choice after watching a posture as soon as possible to ensure that the choices were made by the first impression. 29 participants were invited to finish the questionnaires.

In the second survey, Nao’s joints were moved to perform the similar posture according to each one in the picture mentioned above. The positions and directions of the robot’s limbs were set as similar as possible to those in the pictures. Then 42 postures of the robot were obtained. Each posture of Nao was recorded in Choregraphe, a simulator platform for Nao. The postures were performed by the robot and photographed. Then the photos were placed out of sequence in the questionnaire and each photo was followed by 8 options that were the same as the first survey. Because Nao’s facial expression kept unchanged, it would not influence the conclusion. An example of the initial picture, the processed picture and the corresponding photo of Nao is shown in Fig. 1.

Fig. 1.
figure 1

An example of the initial picture, the processed picture and the corresponding photo of Nao (Posture 36 in survey 1).

The statistical result of the first survey is shown in Table 1(a), in which “s1” denotes the sequence number of the human’s posture in the first survey, “a” denotes anger, “d” denotes disgust, “f” denotes fear, “h” denotes happiness, “sa” denotes sadness, “su” denotes surprise, “n” denotes neutral, and “o” denotes other. The rest of the table is about the ratio of each option for each posture. Because a small amount of participants did not make a choice for some postures in their questionnaire, some of the total ratios in one row do not reach 100%. Similarly, the statistical result of the second survey is shown in Table 1(b), in which “s2” denotes the sequence number of the robot’s posture in the second survey.

Table 1. The results of the surveys (out of sequence).

3 Analysis and Hypothesis

In Table 1(a) and (b), the option with highest ratio for each posture would be considered as the best choice to represent the emotion expressed by the posture. The robot’s postures of happiness are depicted as an example in Fig. 2. The best choices of emotions for the human’s postures and the robot’s postures respectively are put together with the keywords of these human’s postures searched in Google in Table 2 for comparison. Through analysis and comparison, we build two hypotheses of the influencing factors on humanoid robots’ emotion expression by body language.

Fig. 2.
figure 2

The chosen robot’s postures of happiness.

Table 2. Comparison of emotions expressed by the initial picture, corresponding body language of the human and body language of NAO.

3.1 Hypothesis of Influencing Factor 1: Ambiguity

If the best emotion choice of a human’s posture and that of the corresponding Nao’s posture are consistent, there are two cases to consider.

The first case is that the search keyword is consistent with the best emotion choice for the human’s posture. Using the sequence numbers in the first survey as reference, for example, the search keyword, the best choice for corresponding human’s posture and the best choice for corresponding Nao’s posture of the first posture are all surprises. Similarly, there are 22 other postures of the same case. Their sequence numbers are 2, 3, 10, 11, 12, 14, 17, 19, 20, 22, 23, 26, 27, 30, 32, 33, 34, 37, 39, 40, 41, 42, respectively.

In the second case, the keyword is different from the best choice for human’s posture. For example, the 36th human posture is recognized as fear whereas its corresponding search keyword is disgust. It suggests that human and Nao can convey the same emotion by the similar body language but the body language itself is unable to express a certain emotion independently. Obviously, this is due to the effect of facial expressions. Without facial expressions, body language cannot express emotions properly and adequately in this case, which means the body language is of ambiguity. Therefore, the ambiguity is supposed as an influencing factor.

3.2 Hypothesis of Influencing Factor 2: Joint Limits

If the best emotion choice of a human’s posture is consistent with the search keyword, similarly, there are also two cases.

The first case is that the emotions are consistent in three corresponding pictures, which has been discussed above. The second one is that the best emotion choices are different between the human’s and the Nao’s postures. In this case, the factors causing the difference may exist in the robot itself, concretely and mainly, the limitations of Nao’s joints. For example, the 5th posture’s keyword and the best choice of human’s posture are both happinesses but the best choice for Nao’s posture is surprise. As can be seen from Fig. 3, Nao cannot perform the similar posture as the human because of its joint limits. There are other 9 postures in the same situation with the sequence numbers of 6, 13, 16, 18, 21, 24, 29, 31, 35. Therefore, the joint limit is hypothesized as another influencing factor.

Fig. 3.
figure 3

The posture of the robot (Posture 5) is restricted by joint limits.

4 Hypothesis Testing

It has been pointed out in the above section that the emotion expression of Nao’s postures may be affected by the inherent ambiguity of the postures and the joint limits of the robot. In this section, the assumptions will be verified through two experiments using the method of hypothesis testing.

4.1 Preparation for Verification

In this paper, a posture is considered ambiguous when the selected times of its best choice do not exceed twice the selected times of its second best choice according to the general rule found in survey 1. For example, 12 participants thought that the 7th posture expressed the emotion of disgust while 7 participants chose the option of fear. The selected ratios of this two emotions are relatively close, which means that the posture itself cannot convey a certain emotion accurately. 42 Nao postures can be divided into the ambiguous type and the non-ambiguous type according to this definition. Also, the postures also can be divided into two types according to joint limits.

Therefore, we obtain four groups of postures, including 9 postures with ambiguity and without joints limits (Group 1), 12 postures without ambiguity and with joint limits (Group 2), 12 postures without both of them (Group 3) and 9 postures with both of them (Group 4).

4.2 Experiment 1: Verification of Ambiguity

Group 3 and Group 1 were used to verify the effect of the ambiguity of postures on robots’ emotion expression. Defining the emotions chosen by most participants as the correct emotions, we got the accuracy of the recognizing emotion for each posture of NAO which was the sample for T-test. The result of normality test is shown in Table 3. The first row is for Group 3 and the second row is for Group 1. The statistics of these two groups are normally distributed since Sig.>0.05. Then T-test could be conducted and its result is shown in Table 4. In the Levene’s test, Sig. is larger than 0.05, suggesting that two samples have similar variances. The Sig. in the first line is less than 0.05, so we can conclude that the accuracies of these two group are significantly different. According to the box plot depicted in Fig. 4, the non-ambiguous group have higher accuracies of the emotion expression than the ambiguous group generally.

Table 3. The result of normality test in experiment 1.
Table 4. The result of T-test in experiment 1.
Fig. 4.
figure 4

The box plot for Nao’s postures with ambiguity (Group1, on the right) and without ambiguity (Group3, on the left).

4.3 Experiment 2: Verification of Joint Limits

Group 3 and Group 2 were used to verify whether joint limits affect the emotion expression of the robot. Before verification, we should make sure that there is no significant difference between the accuracies of the emotion expressions of human’s postures corresponding to these two groups of NAO’s postures. Similar to experiment 1, normality test and T-test were conducted and the results are shown in Tables 5 and 6. Two groups of samples obey the normal distribution and the accuracies of these two groups are basically the same according to the results which suggested that the postures of the samples are similarly not ambiguous. The corresponding box plot is depicted in Fig. 5(a).

Table 5. The result of normality test for human’s postures in experiment 2.
Table 6. The result of T-test for human’s postures in experiment 2.
Fig. 5.
figure 5

Box plots in experiment 2 (Group 3 on the left, Group 2 on the right).

Then normality test for the accuracies of these two group was conducted and its result in Table 7 shows that the data of these two groups are not normally distributed. Therefore, nonparametric test was adopted instead of T-test. In SPSS, Mann-Whitney U test, one method of nonparametric test, was conducted automatically. The result in Table 8 shows that the null hypothesis is rejected, in other words, the accuracies of emotional expressions of the group with joint limits are significantly different from those of the group without joint limits. According to Fig. 5(b), the accuracies of emotion expressions of the group without joint limits are generally higher than the group with joint limits.

Table 7. The result of normality test for Nao’s postures in experiment 2.
Table 8. The result of Mann-Whitney U test for Nao’s postures in experiment 2.

5 Conclusions

In this paper, the questionnaire survey and hypothesis testing were conducted to investigate what factors may affect the emotion expression of robot’s postures. First the emotional postures library was established by searching pictures on the Internet. After removing the face of the people in the pictures and only reserving the body postures, Nao robot was controlled to make the same postures so that we got 42 human’s postures and 42 Nao’s postures. Then two questionnaire surveys were conducted to investigate the volunteers’ recognition of the emotions expressed by human’s and Nao’s postures. By analyzing the statistic results, we made hypotheses that the ambiguity of postures and robots’ joint limits affect the emotion expression of robot’s postures and the assumptions were verified by hypothesis testing. The result shows that the hypotheses are tenable. Besides, the emotion can be conveyed more correctly without the ambiguity of postures and joint limits. Hence, these two factors should be avoided in practical application.