Keywords

1 Introduction

Intelligent virtual agents (IVAs) that interact face-to-face with humans are beginning to spread to general users, and IVA research is being actively pursued. IVAs require both verbal and nonverbal communication abilities. Among those non-verbal communications, Ekman classifies gestures into five categories: emblems, illustrators, affect displays, adapters, and regulators [1]. Self-adaptors are non-signaling gestures that are not intended to convey a particular meaning [2]. They are exhibited as hand movements where one part of the body is applied to another part of the body, such as picking one’s nose, scratching one’s head and face, moistening the lips, or tapping the foot. Many self-adaptors are considered taboo in public, and individuals with low emotional stability perform more self-adaptors, and the number of self-adaptors increases with psychological discomfort or anxiety [2,3,4]. According to Caso et al. self-adaptor gestures were used more often when telling the truth than when lying [5].

Because self-adaptors have low message content and are low in relevancy to the contents of conversations, they are believed to be actions that are easily ignored during a conversation. Thus, there has not been much IVA research done on self-adaptors, compared with nonverbal communication with high message content, such as facial expressions and gazes. Among few research that has dealt with an IVA with self-adaptors, Neff et al. reported that an agent performing self-adaptors (repetitive quick motion with a combination of scratching its face and head, touching its body, and rubbing its head, etc.), was perceived as having low emotional stability. Although showing emotional unstableness might not be appropriate in some social interactions, their finding suggests the importance of self-adaptors in conveying a personality of an agent [6].

However, self-adaptors are not always the sign of emotional unstableness or stress. Blacking states self-adaptors also occur in casual conversations, where conversants are very relaxed [7]. Chartrand and Bargh have shown that mimicry of particular types of self-adaptors (i.e., foot tapping and face scratching) can cause the mimicked person to perceive an interaction as more positive, and may lead to form rapport between the conversants [8].

We focus on these “relaxed” self-adaptors performed in a casual conversation in this study. If those relaxed self-adaptors occur with a conversant that one feels friendliness, one can be induced to feel friendliness toward a conversant that displays self-adaptors. We apply this to the case of agent conversant, and hypothesize that users can be induced to feel friendliness toward the agent by adding self-adaptors to the body motions of an agent, and conducted two experiments.

The first experiment evaluated continuous interactions between an agent that exhibits self-adaptors and without [9]. The results showed the agent that exhibited relaxed self-adaptors was more likely to prevent any deterioration in the perceived friendliness of the agent than the agent without self-adaptors. However, when we consider evaluators social skills, there is a dichotomy on the impression on the agents between users with high social skills (HSS hereafter) and those with low skills (LSS hereafter). Social skills are defined as “skills that are instrumental in conducting smooth personal communication” [10]. People with HSS are able to read nonverbal behaviors of their conversants and tend to use a great amount of nonverbal behaviors themselves in order to makes smooth interactions. We focused on this characteristic of social skills and considered that it could have the same effect when applied to non-verbal behavior of an agent. The results of the first experiment indicated people with HSS harbour a higher perceived friendliness with agents that exhibited relaxed self-adaptors than people with LSS. Moreover, HSS’s friendliness feeling toward the agent with self-adaptors increased over time, while LSS felt higher friendliness toward the agent that does not exhibit self-adaptors. The dichotomy between the use’s social skills suggests that it is possible to continually improve users’ sense of friendliness toward IVAs by combining the presence of self-adaptors with the user’s level of social skills during continued interactions with agents.

The second experiment evaluated interactions with agents that exhibit either relaxed self-adaptors or stressful self-adaptors in a desert survival task [11]. The results indicated that the exhibiting of any types of self-adaptors in interactions that exchange serious opinions, such as a desert survival task, caused deterioration in the agents perceived friendliness and empathy, although such deterioration does not occur during a casual conversation with the agent displays self-adaptors. This results suggests that users unconsciously expect agents to behave in a manner that is appropriate to the topic of conversation as we do with humans. Thus non-verbal behaviors of agents should adapt to the conversational topics. Taken together with the results of previous research, the results shows that it will be necessary to make the non-verbal behavior of an agent, at least, self-adaptors, adapt to the social skills of the other person in an interaction, and to the conversational content.

This paper reports a result of our consecutive experiment of self-adaptors that deals with gender issues. As Cassell points out in [12, 13], considering gender effect is essential for successful and comfortable human-computer interaction, so as for human-agent interaction.

2 Related Research on Gender and Virtual Agents

Social psychology studies have indicated gender stereotypes and roles. Men are regarded as more dominant, influential and more effective leaders than women, while women are submissive, supportive, and better listeners than men [14, 15]. Commercially used virtual agents mainly serve as virtual assistant chatbots to help online users. They are often represented as female due to the above gender stereotypes, i.e., Aetna’s virtual online assistant Ann (Fig. 1(a)Footnote 1), IKEA’s virtual assistant Anna (Fig. 1(b)Footnote 2), and Alaska Airline’s virtual assistant Jenn (Fig. 1(c)Footnote 3).

Fig. 1.
figure 1

Female virtual agents used for commercial purpose.

However, it is still an open question whether female appearance is adequate for any virtual agent applications and domains. Zanbaka et al. examined the role of gender in an application where virtual agents act as persuasive speakers, and found cross gender interactions between the agents’ gender and the participants’ gender. The male participants were more persuaded by the female agent than the male agent, and female participants are more persuaded by the male agent than the female agent [16].

Our two previous experiments used a female agent only and did not consider the effects of appearance of the agent’s gender. Moreover, as some self-adaptors are gender-specific [17], i.e., “crossing arms” self-adaptors are more frequently found in males, and “covering mouth” self-adaptors are mostly found in Japanese females, we need to consider gender of the agent, gender-specific self-adaptors, and gender of participants.

Hence, we evaluate the impression of the agents with male/female appearance and masculine/feminine self-adaptors in this experiment in order to examine whether cross gender effects similar to [16] can be found in our experimental settings. We hypothesize that (1) when the agent’s gender, and gender of the gender-specific self-adaptors are consistent, participants feel higher naturalness. (2) Male participants have better impression toward the agent with female appearance and feminine self-adaptors, while female participants have better impression on the agent with male appearance and masculine self-adaptors”, and conduct an experiment.

3 Video Analysis of Self-adaptors and Implementation of Agent Animation

3.1 Video Analysis of Self-adaptors

We conducted a pre-experiment in order to examine when and what kind of self-adaptors are performed, and whether/what kind of gender-specific self-adaptors are found during a casual conversation between friends in a Japanese university. We invited ten pairs (5 male pairs and 5 female pairs) who are friends for more than three years (they are university students who study together) to record their free conversation for 20 min.

The video analysis were made in terms of the body parts touched, frequency of each self-adaptors, and number of participants who performed each self-adaptors during the conversation for all 20 participants. Total of 587 self-adaptors were identified during the 20 min recordings of the 10 male participants. Total of 617 self-adaptors were identified during the 20 min recordings of the 10 female participants. Figures 2 and 3 show the body parts touched by the male and female participants respectively. The most frequently touched body part by the Japanese male participants is head (66%), followed by upper body (26%). The most frequently touched body part by the Japanese female participants is head (50%), followed by upper body (16%). Table 1 shows the top five types of self-adaptors performed most frequently and most participants (how and which body parts were touched, how many times for each self-adaptor, and by how many people for each self-adaptor) by the male participants and Table 2 by the female participants.

Fig. 2.
figure 2

Ratio of body parts touched by the male participants.

Fig. 3.
figure 3

Ratio of body parts touched by the female participants.

Table 1. Number of self-adaptors performed by male participants in video recordings (left: number of times, right: number of people).
Table 2. Number of self-adaptors performed by female participants in video recordings (left: number of times, right: number of people).

We identified the following gender-specific self-adaptors from the recordings of the conversations among Japanese university students. There are three types of self-adaptors occurred most frequently in most male participants: “touching nose”, “touching chin,” and “scratching head.” We call these self-adaptors as “masculine self-adaptors” hereafter. The most frequent self-adaptors performed by most female participants are “touching nose”, “stroking hair”, and “touching mouth (covering mouth)”. We call these self-adaptors as “feminine self-adaptors” hereafter. Figure 4 shows typical masculine self-adaptors seen in the video recordings performed by Japanese male students, and Fig. 5 shows those by Japanese female students.

Fig. 4.
figure 4

Male participants perform three masculine self-adaptors (from left: “touching chin,” “scratching head,” and “touching nose”.

Fig. 5.
figure 5

Female participants perform three feminine self-adaptors (from left: “touching nose”, “stroking hair“, and “touching lips (covering mouth)”.

We implement those masculine/feminine self-adaptors to our conversational agents for the experiment. In terms of the timing of self-adaptors, 50% occurred at the beginning of the utterances in the video recordings.

3.2 Agent Character and Animation Implementation

The agent characters (male and female) and animation of the six types of self-adaptors were created using PoserFootnote 4. Figures 6 and 7 show the agents carrying out the three masculine self-adaptors and three feminine self-adaptors respectively. We created the following four types of animations in order to examine the combination of gender of the character and self-adaptors; “male agent performs masculine self-adaptors”, “male agent performs feminine self-adaptors”, “female agent performs masculine self-adaptors”, “female agent performs feminine self-adaptors.”

Fig. 6.
figure 6

Male agent performs three masculine self-adaptors (from left: “touching chin,” “scratching head,” and “touching nose”.

Fig. 7.
figure 7

Female agent performs three feminine self-adaptors (from left: “touching nose”, “stroking hair”, and “touching lips (covering mouth)”.

We found no literature that explicitly described the form of the movement (e.g., how the nose has been touched, in which way, by which part of the hand etc.), we mimicked the form of the movements of the participants in the video recordings. We adjust the timing of the animation of self-adaptors at the beginning of the agent’s utterances as found in the video recordings.

Besides these self-adaptors, we created animations of the agent making gestures of “greeting” and “placing its hand against its chest.” These gestures were carried out by the agent at appropriate times in accordance to the content of the conversation regardless of experimental conditions in order not to let self-adaptors stand out during a conversation with the agent.

4 Experiment

4.1 Experimental System

The agent’s conversation system was developed in C++ using Microsoft Visual Studio 2008. The agent’s voices were synthesized as male and female voice using the Japanese voice synthesis package AITalkFootnote 5. Conversation scenarios, composed of questions from the agent and response choices, were created beforehand, and animation of the agent that reflected the conversational scenario was created. Figure 8 shows the experiment system components. By connecting animated sequences in accordance of the content of the user’s responses, the system realized a pseudo-conversation with the user. The conversation system has two states. The first state was the agent speech state, in which an animated sequence of the agent uttering speech and asking questions to the user was shown. The other state was the standby for user selection state, in which the user chose a response from options displayed on the screen above the agent. In response to the user’s response input from a keyboard, animated agent movie that followed the conversation scenario was played back in the speech state.

Fig. 8.
figure 8

Dialogue and agent animation control system.

4.2 Experimental Procedure

The interactions with the agents were presented as pseudo conversations as follows: (1) the agent always asks a question to the participant. (2) Possible answers were displayed on the screen and the participant selects one answer from the selection from a keyboard. (3) The agent makes remarks based on the user’s answer and asks the next question. The contents of the conversations were casual (the route to school, residential area, and favorite food, etc.). The reason we adopted the pseudo-conversation method was to eliminate the effect of the accuracy of speech recognition of the users’ spoken answers, which would otherwise be used, on the participants’ impression of the agent.

The participants in the experiment were 29 Japanese undergraduate students (19 male and 10 female), aged 20–23 years, who did not participate in the video recording pre-experiment. The experiment is conducted as 3 × 2 factorial design. The experimental conditions are participants’ gender (male/female), agent’s gender (male/female), gender of self-adaptor (male/female). Each participant interacted with all four types of agents (male agent performing masculine self-adaptors, male agent performing feminine self-adaptors, female agent performing masculine self-adaptors, female agent performing feminine self-adaptors) randomly assigned to them. Thus, there are four conversation sessions with different combination of the agent and self-adaptor for each participant. The conversational topics are different for each interaction and the topics are randomized. Each agent performed three all gender specific self-adaptors in any interaction and the gestures of “greeting” and “placing its hand against its chest.”

After each interaction, the participants rated their impressions on the agent using a semantic differential method on a scale from 1 to 6. A total of 27 pairs of adjectives, consisting of the 20 pairs from the Adjective Check List (ACL) for Interpersonal Cognition for Japanese [10] and seven original pairs (concerning the agent’s “humanness,” “naturalness,” “annoyingness”, and “masculinity” etc.), were used for evaluation. The list of adjectives is shown in Table 3 in Sect. 5. At the end of the experiment, a post-experiment survey was conducted in order to evaluate the participants’ subjective impression of overall qualities of the agents, such as the naturalness of their movements and synthesized voice and whether they have noticed the difference of gestures.

Table 3. Four factors and adjectives for interpersonal impressions.

5 Results

5.1 Results of Factorial Analysis

Factor analysis (FA hereafter) was conducted on the agent’s impression ratings obtained from the experiment in order to extract the factors that composes our interpersonal impressions toward the agents. The results of FA using the principal factor method extracted four factors (shown in Table 3). The First factor is named as “Tolerance factor” (composed of adjectives such as calm, broad-minded, kind, soft, and sophisticated), the second as “Sociability factor” (composed of adjectives such as active, cheerful, confident, and social), the third as “Gender factor” (composed of adjectives such as lovable, feminine, and delicate), and the forth as “Naturalness factor” (composed of adjectives such as natural and humanlike).

Cronbach’s coefficients alpha for the factors are 0.84 for “Tolerance factor”, 0.79 for “Sociability factor”, 0.67 for “Gender factor”, and 0.62 for “Naturalness factor”, which show high enough internal consistency of the extracted factors. The result of the factorial analysis indicates when the participants perceive the agents interpersonally and rate their impressions, these four factors have large effects. Thus we will use the factors and factorial scores for later analysis to evaluate the gender effects.

5.2 Analysis of Tolerance Factor and Sociability Factor

We performed three-way ANOVA (repeated measures) with factors “participant gender”, “agent gender”, and “gender of self-adaptor”. The dependent variables are total factorial score of each factor.

The result showed there are no main effects of participants’ gender, agent’s gender, and gender of self-adaptor on “Tolerance factor” and “Sociability factor”. There are significant second-order interactions in the “Tolerance factor” (p ≤ 0.05) between participants’ gender and agents’ gender. Figure 9 shows the tolerance factor score of each condition. The male participants rated the female agent performing feminine self-adaptors significantly higher than the same agent performing masculine self-adaptors (F: 4.58, p ≤ 0.05). While the female participants showed tendency for higher rating to the female agent performing masculine self-adaptors (F: 2.55, p = 0.122). There are no difference in the tolerance factors when the participants evaluated the male agent. While to the case of the female agent, the tolerance scores were higher when the female agent performs different gender’s self-adaptors from the participants’ gender. There are no significant main effects nor second-order interactions found in the “Sociability factor” (shown in Fig. 10).

Fig. 9.
figure 9

Tolerance factor score of four conditions compared by participants’ gender.

Fig. 10.
figure 10

Sociability factor score of four conditions.

5.3 Analysis of Gender Factor

We performed three-way ANOVA for total factorial scores of gender factor. Figure 11 shows gender factor scores of four conditions. The main effect of agent’s gender on gender factor is found (p ≤ 0.01). Significant second-order interactions are not seen in gender factor. These results mean the agents’ appearance made significant differences in impression of gender. The male agent were perceived as more masculine than the female agent regardless of the gender of self-adaptors, and the female agent were perceived as more feminine than the male agent regardless of the gender of the self-adaptors by both gender of participants.

Fig. 11.
figure 11

Gender factor scores of four conditions compared by participants’ gender.

However, when we focus on the gender factor score of the female agent, a significant difference in participants’ gender was found. As shown in Fig. 12, in the case of the female agent, the male participants perceived significant higher femininity to the female agent performing feminine self-adaptor (F: 4.88, p < 0.05) than the same agent performing masculine ones. While the female participants showed no difference in the gender scores of the same agent conditions. It should be noted that only one female participant (out of 29) noticed the difference of each condition and identified masculine and feminine self-adaptors.

Fig. 12.
figure 12

Gender factor scores of female agent conditions compared by participants’ gender.

5.4 Analysis of Naturalness Factor

We performed three-way ANOVA for total factorial scores of “Naturalness factor”. Figure 13 shows naturalness factor scores of four conditions. There are no significant main effects nor second-order interactions found in the naturalness factor. This means the participants perceived agents with all conditions as equally natural.

Fig. 13.
figure 13

Naturalness factor score of four conditions.

6 Discussion and Future Directions

The above results showed we did not find deterioration in the perceived naturalness of agents when the agents’ appearance and gender of self-adaptors don’t match. Thus, our hypothesis 1 “when the participant’s gender, agent’s gender, and gender of the gender-specific self-adaptors are consistent, participants feel higher naturalness than any other combinations” was not supported.

In the case of the female agent, there are interactions between the participants’ gender and gender of self-adaptors in the tolerance factor. Specifically, the female participants had lower impression on the feminine self-adaptors performed by the female agent. Thus, our hypothesis 2 “Male participants have better impression toward the agent with female appearance and feminine self-adaptors, while female participants have better impression on the agent with male appearance and masculine self-adaptors” was not fully supported. We will discuss why the hypothesis was not supported below.

When the participants evaluate the impression of the agents used in the experiment, the four factors forms the overall impression of the agent, namely, tolerance, sociability, gender, and naturalness. The analysis of gender factor showed the participants of both gender correctly perceived the gender of the agent. Only male participants perceived the feminine self-adaptors performed by the female agent as most feminine, while such correct perception did not occur in the case of the female participants, nor of the male agent, and the masculine self-adaptors. On the other hand, all agents in four conditions are perceived as equally natural even when the gender of the agent and the gender of self-adaptors don’t match. In terms of perceived tolerance, the female agent’s performing the feminine self-adaptors resulted in opposite impressions between the male and female participants. The male participants perceived the female agent performing feminine self-adaptors as most tolerant, while the female participants rated the same condition as least tolerant in all conditions. Such cross gender interaction was not found in the case of the male agent with both self-adaptors (masculine/feminine).

The results suggest interesting cross gender interactions in perceiving the feminine self-adaptors. The Japanese male participants are in favor of the feminine self-adaptors, while the Japanese female participants have rigorous impression on them when they are performed by the female agent, without noticing the difference as all conditions are rated as equally natural. This suggest there is a dichotomy between participants’ gender in the perception of combination of self-adaptor and agent’s gender. Thus the hypothesis 2 is partially supported only to the case of the female agent.

This research is still at a starting phase, thus has several limitations. Firstly, we need to conduct more fine grained study on the self-adaptor in human-human interactions. Extraction of self-adaptors was made from the video recordings of only 20 participants, who are undergraduate students in Japan. The evaluations of self-adaptor performing agents were made by 29 Japanese undergraduate students (different subjects from those who were videotaped). Given the enormous inter-subjective variability in gesture use, we need to conduct close observations on the form and movements of self-adaptors with larger samples with wider age range and cultures.

Secondly, although we compared only masculine/feminine self-adaptors in this experiment, we need to compare impressions with non-self-adaptor condition in order to evaluate the masculinity and femininity of the self-adaptors solely.

Thirdly, the result of this study is limited to the virtual agents used in our experiment and may not generalize to other types of virtual agents. Further research should use wider variety of virtual agent appearances.

Finally, future work should also consider cultural diversity in expressing and perceiving self-adaptors. There are culturally-defined preferences in bodily expressions [18,19,20,21] and in facial expressions [22, 23], and allowance level of expressing non-verbal behavior are culture-dependent. Japanese male tend to perform self-adaptors around their nose and chin more frequently than other cultures by observation, and Japanese female tend to cover their mouth while talking, which is considered as typical Japanese female self-adaptor. We will investigate culture specific self-adaptors from video recordings of human-human interactions from other cultures. Furthermore, we will implement them with agents, and conduct a cross-cultural evaluation study.

7 Conclusion

The contributions of this study are: (1) identified gender specific self-adaptors in Japanese male and female university students, (2) suggested significant cross gender interactions between the gender of agents and the participants’ gender in the case of the female agent. Our evaluation of the interactions between the agents that exhibit self-adaptors typically exhibited by Japanese male and female indicated that there is a dichotomy on the impression on the agent between participants’ gender. Japanese male participants showed more favorable impressions on agents that display feminine self-adaptors than masculine ones performed by the female agent, while Japanese female participants showed rigorous impressions toward feminine self-adaptors.

Although we need to investigate our perception of agents with wider variety of agent’s appearances, gestures, and cultures, the result implies the combination of male appearance and masculine gestures might be “safer” in order to facilitate neutral impressions and avoid any cross gender interactions made by the gender of human users. Designers of virtual agents should consider gender of appearance and gesture animations of virtual agents, and make them customizable according to the user’s gender, preferences, social skills, conversational content, and cultures. We could make use of the advantage of virtual agents that they are flexible to customize to make them suit various conditions.