Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

Non-player characters (NPC) that players encounter in games can have very different roles. Depending on the game itself, they can act as an obstacle, being an enemy, or they can act as an ally, helping the player to reach his/her objectives. Interacting with an NPC may be important for the progression of the player. To avoid blocking the player’s progress with an unexpected situation the NPCs’ behavior is usually scripted, meaning that they follow a precise predefined scenario. Therefore they usually act as emotionless robots that are only here to obey the rules of the game; they do not adapt their behavior to the current game situation, giving no sense of engagement in their interaction with the player. In order to create more compelling experiences, one can consider developing an emotional connection with the players [77]. One step toward this goal is to model NPCs with socio-emotional behaviors adapted to the game. In this chapter, we focus on the research on emotion and attitude modeling for the non-player characters in order to enhance this connection. Some games successfully convey emotional themes by proposing a very cinematographic experience and by including non-playable sequences and rich dialogues in which the characters (including the avatar of the player) can show powerful emotional behaviors like in the critically acclaimed video game The Last of Us (Sony Computer Entertainment 2013). Game with a much more narrative experience can elicit emotional connections, for example Heavy Rain (Sony Computer Entertainment 2010) or The Walking Dead (Telltale Games 2012), by creating characters that express different emotional reactions during the non-playable sequences of the game depending on the choices of the player during the interactive sequences. However, these systems still use a scripted scenario and even if the developers created a very large tree of possibilities with rich emotional expressions, the NPCs’ reactions during the interactive phases do not show variability depending on the social bond the player could develop with them. The video game The Sims 4 (Electronic Arts 2014) is a recent example of a game that makes use of the emotions of the virtual characters to trigger various reactions of the NPCs during the gameplay phases. However, to our knowledge, very few games endow the NPCs with enough autonomous capabilities to trigger adaptive emotional behaviors.

Autonomous virtual characters are capable of reacting in a human-like way, in any situations. In particular they are able to reason and to take decision to overcome an event [45]. When interacting with humans, they can respond emotionally and show their engagement [12], creating compelling and engaging narrative experiences. In particular, Embodied Conversational Agents [17] have been endowed with the capacity to display believable emotional and social reactions. To build such agents, computational models of emotion and social behaviors for virtual characters have been developed for more than a decade now [35]. In this chapter, we present the current state of these computational models. In the next section, we detail works on emotion modeling, first presenting different emotion theories from the literature in Human and Social sciences and then describing computational models of emotion; we also present models on emotion expressions in virtual characters. Then, in section “Attitude Modeling”, we turn our attention to social behaviors like interpersonal attitudes. Finally, in section “Conclusion”, we review what has been discussed and present perspectives for game developers using these technologies.

Emotion Modeling

Different computational models have been proposed in order to model triggering and expressing mechanisms linked to emotions inspired by human behaviors.

In this section, we present theories that are popular among the virtual human researchers to base their model on, and we review some existing works on emotional virtual humans.

Theory on Emotions

The literature in Human and Social sciences contains different representations for the emotions and how they are triggered. For instance, Kleinginna and Kleinginna listed 92 different definitions that might be regrouped into 11 categories [40]. In this section, we are presenting some of the most popular theories of emotion.

Basic Emotions

Scholars have proposed that some emotions are defined by a fixed neuromotor program; they are biologically predetermined [26]. This theory supports Darwin’s hypothesis claiming that there is a limited number of basic emotions. These emotions are innate; essentially they have a communicative function, and are thus universally recognized [22]. Ekman, for instance, defines six basic emotions corresponding to particular and distinct facial expressions: anger, disgust, fear, sadness, happiness and surprise [25]. However, recent work tends to deny the universality of basic emotions. In [41], authors assess that facial expressions might be interpreted differently depending on the culture and the gaze direction of the one expressing the emotion. By combining these basic emotions, secondary ones might arise. In his “wheel of emotion”, Plutchick compares emotions to colours [61]. Hence, two basic emotions can be mixed together to obtain a third one; love is considered as a mix between ecstasy and admiration. To complete the analogy, Plutchick also considers intensity to determine emotional labels. As the colour fades away, terror turns into fear, then apprehension.

Multidimensional Models

As opposed to the discrete approach where emotions are labeled, the continuous approach uses a multidimensional space to represent emotions as a particular point in this space. The main dimension used to distinguish emotions is related to pleasure or pain [68]: indeed, this axis of valence allows one to distinguish a pleasant emotion (e.g. joy) from a not pleasant one (e.g. sadness). However, it seems difficult to differentiate emotions such as fear, anger or boredom by using only this single dimension. A more accurate representation therefore requires the addition of one or more axes on top of the valence one. In [67], Russell advocates for a model based on two axes called “Circumplex Model of Affect” obtained by adding the dimension of arousal to the dimension of valence. This model is taken over by Reisenzein [65], but remains criticized, especially in [34] where the authors not only refute the circular hypothesis, but also replace the arousal dimension by the dimension of dominance. One of the most widely used dimensional models is the PAD emotional model [51], which combines Pleasure, Arousal and Dominance to obtain a better and more precise description of the different emotional states. A recent study also confirms the universality of these three axes, while adding a fourth dimension of unpredictability [29].

Appraisal Theory

The most recent theory in the domain of emotions is the appraisal theory, supported, for example, by Scherer [69]. According to this theory, emotions arise from a continuous evaluation of events that are happening combined with our mental state. The cognitive process can be divided into four different evaluation phases: (1) checking the relevance to know whether the event affects me or my group, (2) evaluating the impact of the event on my beliefs and goals, (3) defining my coping potential that can be used to face the situation and (4) calculating the significance of the event according to my social norms and standards. Thus, this representation is powerful enough to describe how different persons engaged in a same situation might express different emotions.

One of the most widely spread theory in the field of affective computing is the OCC theory [58] in reference to its authors Ortony, Clore and Collins. In [58] 22 emotions are categorized into 6 separate groups based on the condition that triggered them. Emotions can arise as reactions to (1) events impacting the goals of the individual, (2) events affecting standards and norms of the individual and (3) events related to the attractiveness of a particular object. The strength of this theory is that it is easily understandable and implementable. It takes into account the valence of emotions. Indeed, every emotion is considered here as pleasant or unpleasant allowing for example to avoid ambiguities such as surprise, which can be good or bad, or even neutral. In [6], the author explains how OCC can be integrated in virtual agents.

Computational Model of Emotions

According to the “Affective Loop” [73], virtual agents have to generate and express congruent emotions to allow powerful experience. In this section, we are listing some works that tried to model the complex mechanisms of emotions for virtual humans. Marsella and colleagues [48] provide an overview of the different computational models of emotion according to the theory they are based on. As stated by the authors, most of these models are rooted in the appraisal theory that was presented in the previous section. Moreover, they propose an idealized computational appraisal architecture and try to decompose the appraisal process into different modules. The appraisal derivation model transforms an event into different appraisal variables. These variables differ according to the theory of the model. Then, the affect derivation model maps the appraisal variables into a particular affective state and specifies the appropriate emotional reaction.

FAtiMA [23] follows this blueprint and offers a generic appraisal framework that can be used to compare the different appraisal theories. The framework implements a core layer (containing appraisal derivation and affect derivation models) on which additional components can be added. The cultural component, for instance, allows the agent to determine the praiseworthiness of an event according to its cultural norms and values. The motivational component introduces basic human drives which are used to determine an event’s desirability: an event raising agent’s drives will be appraised as desirable while another event lowering its drives will be appraised as undesirable. EMA [47] is a computational model also based on appraisal theory. In EMA, the virtual agent interprets events using a set of appraisal variables that are stocked in a structure called appraisal frame. Each event is then represented by an appraisal frame, leading to a particular emotion. Every time the agent evaluates an event, the corresponding frame is updated leading to possible change in the agent’s emotional state. Since many frames can be activated at the same time, the final affective state corresponds to the most recently updated frame. The model also implements two different coping strategies, altering the way the agent will evaluate new events. Some serious games for children, using the FatiMA framework, were developed to propose an emotion-based experience: FearNot [4] presented the danger of bullying, ORIENt [28] was aimed at teaching how to develop intercultural empathy and My Dream Theater helped children to learn how to resolve conflicts [16].

Some computational models also map the emotions calculated by their appraisal component into a three dimensional PAD (Pleasure, Arousal, Dominance) space [51]. Alma [33] is one of the first models that computes an emotion based on OCC theory and converts it into a three dimensional vector. Alma also introduces a more lasting affective state called mood. Once an emotion is triggered, it will “push” or “pull” the mood of the agent according to the positions of the mood and the emotion in the 3D cube. If the computed mood and the computed emotion do not belong to the same cube octant, the mood will be pulled towards the emotion. On the opposite, if the emotion is in between the current mood and the center of the cube, the mood will be pushed towards the edge of the cube. More recent works also map OCC emotions into a PAD space, like WASABI did [7] or GAMYGDALA [62]. The first one also models the mutual influence of emotions and mood over time. Emotions of positive or negative valence respectively increase or lower the mood value. Moreover, agents in a positive mood are more disposed to experience positive emotions. Following Damasio’s theory [21] the model also differentiates primary emotions (represented as a point in the 3D space) and secondary emotions (represented as areas on the same space and implying a more complex cognitive process to be computed). GAMYGDALA provides a simpler generic engine that can be used to compute emotions for Non Playable Characters (NPC) in any kind of video games. To do so, game developers have only to define NPC goals, and annotate events happening in the game with a relation to these goals. According to the OCC Model, GAMYGDALA computes the related emotion for the NPC and maps it into a PAD space.

Emotions can also be represented in a formal way as a combination of logical concepts such as beliefs, desires and intentions. In [1], the authors provide a logical formalization of the emotion triggering process as described by the OCC theory using the agent’s beliefs and desires. For instance, an agent experiences hope if its desire matches its expectation (i.e. the agent desires to be hired by a company and expects that it will happen). The agent will then feel satisfied, if the expected event actually happens, or disappointed if the event doesn’t happen. Meyer proposes a different formalization for four distinct emotions, namely happiness, sadness, anger and fear [52]. In this work, emotions are driven by the status of the agent’s intentions. Thus, sadness is elicited if the agent believes that it cannot fulfill one of the sub-goals needed to reach its intention. The author also provides strategies to cope with the elicited emotions. However, the intensity of the different emotions is not represented by these models. An answer can be found in [57], where the authors propose different variables used to compute emotions intensity: the degree of certainty concerning the intention achievement, the effort invested to try to complete the intention, the importance of the intention and the potential to cope in case of an intention failure.

Finally, some works try to integrate a computational model of emotions into a more general cognitive architecture. Hence, emotions are part of a complex cognitive process. EMA, which has already been introduced above, has been coupled with the SOAR cognitive architecture [42]. The appraisal process presented in LIDA [31] does not really differ from the one presented in [48]. LIDA relies on Scherer’s appraisal variables (relevance, implications, coping potential and normative significance) to assign an emotion to a particular appraised situation. The emotion elicited will then improve learning and facilitate later action selection, by improving the likelihood of an action to be selected. PEACTDIM [46] is another attempt to unify cognitive behavior and emotions. Based on Scherer’s sequential checks, PEACTDIM adds several layers into the SOAR cognitive process. Contrary to many other computational models, emotions in PEACTDIM are not represented by a label or a multidimensional vector, but by the entire appraisal frame.

Expression of Emotions

Now that we saw how to represent and compute emotional reactions, we are going to see different works on computing the multimodal behaviors for a virtual character to display its emotional state. Defining natural and believable expressions of emotions has been one of the main topics of the ECA community since the past decade [60].

One solution is the use of motion capture performances that are reproduced, as captured, directly onto the virtual characters [27]. It is the solution adopted by most game developers. Whereas this approach has the advantage of being very realistic for a specific context, it lacks adaptability and variability. On the other hand, the approach of the ECA community is towards the creation of computational models for the real time synthesis of the emotional expressions. To build these computational models of emotional expression, researchers usually rely on two distinct approaches. The first is based on the collection of data (on human behaviors) and the identification of the features of the emotional expression within this data. The second is based on the literature in Human and Social sciences and on the findings of this literature to create rule-based systems. We now present some relevant works to illustrate the differences between these two approaches.

Data-Driven Models

Some researchers choose to use databases of expressive behaviors, from which, characteristics of the emotional behaviors can be automatically identified and extracted.

Researchers usually use motion capture to build these databases. For instance, the Emilya database [30] is a collection of motion capture data of the whole body from actors performing simple tasks in various emotional contexts. The MMLI database [54] has been built with motion capture data of people laughing in interaction. Machine learning techniques can then be applied to these databases to identify and extract features of the emotion expressions and to build computational models of emotional behaviors. For instance, in [24], the authors applied machine learning techniques on a database of people laughing. The computational model learned the relationship between body movements, facial expressions and acoustic features of laughter such as its energy and pseudo-phonemes. A different approach consists in learning directly a corpus of animation for a virtual agent without captures from human actors. For instance in [56], the authors used a crowd-sourcing method to collect a database of descriptions of different virtual agents smiles (polite, amused or embarrassed). Each description consists of values for the different parameters of the smile (e.g. degree of mouth aperture, of mouth extension). They built a decision tree, directly from this corpus, capable of choosing the values for the parameters depending on the desired type of smile.

Data driven models suffer from the need of an important amount of data. Data collection can be difficult and costly to gather but offers the advantage of obtaining an adaptable generic model that can evolve with new data.

Literature-Based Models

The literature of Human and Social sciences gives us different theories on how humans express emotions [75]. For instance, Ekman proposed a model of description of how facial expression works [26]. His system called Facial Action Coding System (FACS) is used to describe facial expression at the muscular level. FACS is often used to code facial expressions of emotions.

Some researchers, when trying to build a computational model of emotional expression, choose to derive computational rules from the findings of the literature of Human and Social sciences. Like in [74] where the authors compute the expression associated to an emotion as a linear combination of known expressions of emotions set in a 3D space. In [53], the authors present a system inspired by Scherer’s appraisal theory [71] that generates sequences of multi-modal signals conveying emotions. In [43], the authors use the dimensional model of emotions Pleasure-Arousal-Dominance (PAD) to ground the different emotional contexts in which head and body movements vary during gaze shifts. Another dimensional representation (Valence-Arousal-Dominance) is used in [2] where the authors attempt to create better emotional expressions by using asymmetric facial expressions. In [49], the authors present a system that generates the nonverbal behavior of a virtual character depending on its speech. The speech is analyzed to extract characteristics and, using rules derived from the literature, the appropriate behaviors are selected.

These systems are less costly to produce as they do not require data to power them. Moreover, the rules derived can be customized to fit particular needs (scenario, cultural or gender specific behaviors for instance) and obtain a rich repertoire of multi-modal behaviors. However they lack the adaptability and variability of the data-driven models.

Attitude Modeling

Like the modeling of emotions, modeling the attitude of virtual humans requires the development of models capable of computing and expressing attitudes.

The research about attitudes is quite new in the ECA community compared to the research about emotions. However, a few systems already exist; they rely on different theories from the literature of Human and Social sciences. In this section, we are presenting some theories about the representations and the expressions of attitudes and we are reviewing some work on social virtual humans.

Theory on Attitudes

First it is important to define what an attitude is. A review of the relevant literature is proposed in [19] where the authors present different definitions. One of them is the commonly used definition (in the ECA community) of interpersonal stances by Scherer [70]. Scherer explains that an attitude is “an affective style used naturally or strategically in an interaction with a person or a group of persons”. In other words, within an interaction, one might use different attitudes depending on one’s interlocutor. One might act nicely with a friend or bossy with a subordinate. These attitudes are expressed using verbal and nonverbal cues as explained in [19].

In order to replicate these attitudes within virtual humans, it is necessary to choose a representation. Different ones have been proposed through the years. The representation from Schutz consists of three dimensions which are the Inclusion, the Control and the Appreciation [72]. Later, Burgoon and Hale proposed a 12-dimensional representation to characterize different styles of interaction [13]. But the most used representation in the ECA community is the one from Argyle which consists of a dimension of Status and a dimension of Affiliation. Using these axes, an alternative circular representation has been proposed by Wiggins [76] called the Interpersonal Circumplex. These 2-dimensional representations are easy to manipulate and some researchers on human behaviors used them to describe how attitudes influence the nonverbal behavior of a person. Mehrabian, for instance, described in [50] how posture, distance and gaze can convey information about Status or Affiliation.

Computational Model of Attitudes

In this section, we are listing some works that proposed to model the complex mechanisms of attitudes within virtual humans.

As explained in [70], a social attitude is a combination of both spontaneous appraisal of the situation and strategic intentions. However, most of the computational models focus on the spontaneous appraisal, where agents only display what they feel. If an agent feels like it has power over another one, it may show dominance. Furthermore, if an agent really likes another one, it may express friendliness. Thus, to know which social attitude an agent should display toward another agent, we first have to compute its social relation.

One approach to model the dynamics of agent’s social relations and thus the attitude it expresses is based on the emotions felt by the agent; that is agent displays its emotional state as a sign of its social relations toward its interlocutors. In SCREAM [64], emotions felt by the agent play an important role, changing the relationship according to their valence and intensity. A positive emotion elicited by another agent will raise the liking value towards it, while a negative emotion will have the opposite effect. The authors also add the notion of familiarity, still changing according to emotions, but evolving monotonically: only positive emotions are taken into account. Similar dynamics can be found in [55], where authors describe the influence of particular emotions on liking, dominance and solidarity. For instance, an agent A feeling an emotion of pride elicited by another agent B will improve A’s values of dominance and liking toward B. These values are initially defined by the role of the agent. Finally, in EVA [38], the relation between the agent and the user is represented by two values of friendliness and of dominance. As for these works [38, 55], these values evolve according to four emotions felt by the agent: gratitude, anger, joy and distress.

In SGD [63], the authors try to team up humans with a group of synthetic characters, using a formal representation of liking and dominance. However, the evolution of these two dimensions does not rely on emotions, but on the content of the interactions between the agents. Socio-emotional actions, such as encouraging or disagreeing with one agent, will have an impact (respectively positive and negative) on its liking value. Instrumental actions, such as enhancing an agent’s competence or obstructing one of its problems, will have an impact on its dominance. Callejas et al. also rely on a circumplex representation to build a computational model of social attitudes for a virtual recruiter [15]. In this work, the social attitude of the recruiter is dynamically computed according to the difficulty level of the interview and the anxiety level of the user. The recruiter will be friendly in lower difficulty levels, but might change its attitude as the difficulty increases. Here, the attitude is expressed strategically, in order to comfort or to challenge the user.

Although all the works presented above use a multidimensional representation of social attitudes, some other works only model one dimension. For example, Castelfranchi [18] formalizes the different patterns of dependence that can happen in a relationship. Basically, an agent is dependent on another one if the latter can help the former to achieve one of its goals. The dependence level may vary if the dependent agent finds alternative solutions, or manages to induce a mutual or reciprocal dependence. Hexmoor et al. [37] address autonomy, power and dependence from another perspective. In this work, the agent’s power is characterized as a difference between personal weights and liberties of preferences. The weights influence the agent towards individual or social behavior. The liberties represent internal or external processes that influence the agent’s preferences of choice.

Avatar Arena focuses on the appreciation in a scenario in which a user must negotiate a schedule appointment with several agents [66]. Before each session, the appreciation level between agents is fixed (low, medium, high), as well as their assumptions about other agents preferences. According to the Congruity Theory described by Osgood and Tannenbaum [59], when an agent discovers a mismatch between its assumption about another agent’s preference and what this agent’s preference actually is, this might trigger a change in the appreciation level toward the other agent. Finally, some works rely on stage models to implement the notion of intimacy in their agents. This is the case for Laura, who encourages users to exercise on a daily basis [11]. Lauras behavior is driven by its intimacy level that evolves during the interactions. The more the user interacts with Laura, the more familiar it will behave. Another example of relational agent is Rea, who adapts its dialog strategy according to the principle of trust [10]. Endowing the role of an estate agent, Rea uses small-talks to enhance the confidence of the user. Once the user becomes more confident with Rea, task-oriented dialog can take place.

Expression of Attitude

Different systems that aimed at computing the behaviors expressing an attitude are presented in this section.

In the Demeanour project, virtual characters were used as avatars by users improvising a story [5]. The users can define their avatar’s interpersonal attitude, and posture and gaze behavior would then be automatically generated for the avatar. For instance, a friendly avatar would orient itself more towards other avatars.

Fukayama et al. have proposed a gaze model that can express attitudes [32]. They proposed a two-state Markov model (i.e. a state where the gaze is directed at the interlocutor, a state where the gaze is averted), the parameters of which (i.e. total amount of gaze directed at the interlocutor, mean duration of gaze, direction of gaze aversion) were defined using values from the literature on gaze behavior [3]. They found that a very low (25 %) or very high (100 %) amount of gaze directed at the interlocutor conveys hostility. Dominance is linearly correlated with the amount of gaze directed at the interlocutor. Downward gaze aversions are less dominant, and sideways gaze aversions are less friendly.

The Laura ECA was developed in order to engage with users in long-term relationships [11]. The goal of Laura was to motivate the users to start a physical activity. Two versions of Laura were compared in a longitudinal study with users: a neutral version and another version designed to appear friendlier throughout the interactions. The friendly version would produce more gestures, head movements, facial expressions of emotions (e.g. displays of empathy towards the user), and would appear physically closer to the user on the display screen. The friendly agent was attributed higher scores on a variety of measures including trust, respect, and affiliation.

Bee et al. have studied the dominance expressed by a virtual agent in a series of studies [8, 9]. In the first study [8], they investigated the impact of various expressions of emotions combined with different head positions and gaze directions on the expression of dominance. In their second study [9], Bee et al. investigated the combination of a dialogue model expressing different personalities (i.e. introvert vs extravert, agreeable vs disagreeable) and a gaze model. They found that the expressed attitude is more easily identified when the two models are used together.

Lee and Marsella have proposed a model for agents in different conversational roles, based on Argyle’s attitude dimensions [44]. They collected data on behaviors of bystanders and side-participants using participants acting in an improvisation scenario where the acted characters would have different attitudes towards one another (e.g. Rio is dominant towards Harmony). Using this data, they proposed a set of rules for the behavior of these side-participants, depending on the attitudes of the characters.

Cafaro et al. proposed a model for the expression of attitudes during greetings [14]. They use previous works that studied proxemics [36] and greetings [39] to define which behaviors should display at which distances in a greeting phase so that it appears more or less friendly.

Ravenet et al. used a crowdsourcing method to build a computational model for the expression of attitudes and communicative intentions. They design an online interface where users chose the behaviors of an agent according to an instruction (e.g. “Which behaviors should the agent display to ask a question while appearing friendly?”). They collected almost 1000 answers from participants. Using these collected data, they built a Bayesian network that represents the probabilities of the occurrence of the considered behaviors depending on an attitude and a communicative intention. This network can be used to generate several combinations of non-verbal signals to communicate an intention with a given attitude, increasing the variability of behaviors of the virtual agent.

Chollet et al. proposed a behavior planning model for expressing attitude; the behavior planning model plans entire sequences of non-verbal signals instead of independent signals [20]. They call this model a Sequential Behavior Planner. The Sequential Behavior Planner takes as input an utterance to be said by the agent augmented with information on the communicative intentions and its attitude it wants to convey. This technique relies on a dataset of sequences of non-verbal signals that were annotated as carrying an attitude that were extracted from a multi-modal corpus using a sequential pattern mining technique. An evaluation showed that the model manages to convey friendliness, hostility, and dominance attitudes, but that it fails to express submissiveness.

Conclusion

In this chapter, we presented various works aimed at giving virtual agents the capacity of expressing believable emotions and attitudes.

Researchers working on Embodied Conversational Agents created various computational models used to trigger in an autonomous fashion, emotional and social reactions within virtual characters. They also developed solutions for the expressions of the associated behaviors (e.g. gestures, facial expression and speech). Their models are based either on the findings of the literature on Human and Social sciences or on collected data. These researchers ran experimental studies to verify that the emotional and social behaviors exhibited by the virtual characters are understood by users and corresponded to what was expected. Nowadays, games still massively use scripted characters but as the level of realism of the narrative experiences proposed by the developers is constantly increasing, the need for a higher level of believability of their worlds is increasing too. Whereas the NPCs can show powerful emotional behaviors during cinematographic sequences, they usually lack of autonomy during interactive phases. The tools presented in this chapter can be useful for game developers in order to go a step further in creating highly immersive experiences. A player, convinced by the behavior exhibited by a NPC, who can consider it to be more than a simple robot, might think carefully about his/her actions in the game as they would impact his/her experience on an emotional level [77]. The player would be able to build his/her own experience depending on how s/he chooses to interact and bond with the NPCs. Game developers can benefit from these models and moreover they can provide the research community with valuable feedbacks on how the models perform in very rich applications.