Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In the unremitting complexity of social life, emotions play a key role in defining and regulating our relationships with others and, more generally, with the environment surrounding us. Our emotional reactions to other people influence how those others react to us and, to a certain extent, how future encounters will develop. At the same time, our own emotional behaviour is shaped by others’ thoughts and deeds. Although emotions undeniably have personal and subjective aspects, they are usually experienced in a social context and acquire their significance in relation to this context (see also the “Socially Situated Affective Systems” chapter in Part V).

1.1 Emotions as Communication Tools

Social processes affect the way in which our emotions unfold at a number of different levels. At the cultural level, what we define as an emotion, and what kind of emotion is appropriate in a specific situation, depends on the set of established values, norms, and customs within one’s own society. Narrowing the focus of our analysis, at the group level our emotions towards other individuals and other groups are affected by our belonging to an in-group with whose members we share a common social identity. Finally, at the interpersonal level, other individuals’ emotions affect our emotions and our emotions affect them in an ongoing process of mutual exchange (Parkinson et al., 2005).

Emotion and its components (e.g. appraisals, bodily changes, action tendencies, expression, regulation, and subjective feeling) , therefore, play an important role in social interactions, social comparison, and social influence processes. For example, Harré and Gillett (1994) emphasise the communicative function of emotion feelings and displays in the events of everyday life by treating them as psychologically equivalent to verbal statements. From this point of view, emotional expression cannot be reduced to a simple direct manifestation of an internal emotional state, but has to be considered as emerging from an interactive context. There is, for instance, evidence that facial displays are affected by both emotional and social factors (Parkinson, 2005). Indeed, several authors have argued that facial displays can serve both emotion-expressive and social communicative functions (e.g. Cacioppo et al., 1992; Hess et al., 1995). Regardless of whether we consider facial displays as depending directly on emotions or on social motives, it is clear that facial behaviour has a great impact in social interactions. Effective communication, for instance, requires not only that we properly interpret other people’s expressions, but also that we correctly assess the extent to which others can read our expressions (Kenny and De Paulo, 1993). Indeed, our own misappraisal of the emotional expression that we have shown to another during an interaction could have important consequences for our interpretation of that person’s responses to us, and thus for the course of the interaction itself (Muttiallu, The effects of display rules on the “illusion of transparency”: moderating factors of partner identity and positivity of emotion, “unpublished”).

Successful emotional communication, though, is not restricted to top-down processes like the ones just illustrated, but necessitates an ongoing reciprocal adjustment between interactants that can often happen at an implicit level. That is, emotional responses are also constructed in a bottom-up way during an interaction in which we do not always register the emotional significance of specific indicators of emotion.

1.2 Explicit and Implicit Responses to Emotions

Some authors have focused on the existence of two modes of interpersonal response to emotions: explicit and implicit (Hsee et al., 1992; Parkinson and Lea, in press). At the explicit level, people seem to gain knowledge about one another’s emotional states by consciously processing relevant information. Thus, verbal and non-verbal reactions during an interaction are categorised as signs, signals, or symptoms of emotion. Furthermore, the nature of our response during an emotional interaction is influenced by the cultural norms and scripts invoked by attributing emotion to another on the basis of perceived responses. On the other hand, people often react to aspects of another person’s emotion at an implicit level without registering the emotional meaning of response components. This is the case, for example, for mimicking others’ expressions on-line (Hatfield et al., 1994) and for automatically adjusting our own rhythms of movement to match those of our interaction partners (Bernieri et al., 1988). However, the distinction between implicit and explicit processes does not necessarily imply that they have to be considered as separate and independent systems. Indeed, most of our everyday interactions involve a complicated combination of implicit and explicit responses to emotions. Each interactant can explicitly transmit emotional information or implicitly convey it, while the other person’s reaction can be at the same level or at a different one (e.g. an implicit “symptom” of one person’s emotion may be registered explicitly by the other, and vice versa).

Therefore, it seems possible that we can regulate our emotional reactions both deliberately and automatically, sharing others’ emotional states or adopting complementary or contrasting ones, sometimes without ever registering the emotion that they are experiencing. Automatic on-line mutual adjustments during an emotional interaction are a vital part of how we respond. From this point of view, any intelligent system programmed to respond only to the explicit, categorical meaning of emotion signals as indications of the emotion itself is likely to miss out on potentially important parts of the usual interpersonal process. Indeed, it is recognised that the existence of multiple communication channels is critical. Early embodied conversational agents (ECAs) tried to convey emotion using analyses of static faces showing full-blown emotions. The results were recognisable, but perplexing and somehow forced and artificial (see Schröder and Cowie, 2006). So, users are unlikely to perceive an embodied agent that works on the basis of emotion decoding as a properly responsive interactant. They may not explicitly know what is missing, but some of the sense of engagement will be lacking.

In the following section we will describe Scherer’s (2001) multicomponent model of emotions and Fogel’s (1993; Fogel et al., 1992) dynamic systems approach to emotions and communication as examples of how emotions may develop during interactions. We will then illustrate some specific effects emerging during emotional encounters and regulating our responses to such interactions. Finally, we will make a comparison between humans and artificial agents in emotional interactions.

2 Theoretical Explanations of Emotions as Ongoing Processes

2.1 Multicomponent Theory of Emotion

Extending the traditional stimulus–appraisal–response model, Scherer (2001) emphasises the multicomponent nature of emotion and proposes a dynamic sequential model in which emotion is considered as an evolved, continually developing mechanism, allowing for flexibility and adaptation to the changing environment. Its primary adaptive role is to focus attention onto situations or events of importance to an organism’s well-being.

The emotional episode itself is viewed as a sequence of synchronised changes in several organismic subsystems following the identification of a significant event. A series of stimulus evaluation checks (SECs) provide the information necessary for an adaptive response. A novelty check scans the environment for changes; if and only if a change is detected, further checks evaluate the nature of the event. Such checks include assessing whether the stimulus should be avoided or approached, whether it is consistent with current goals, the coping potential needed to deal with the stimulus, and whether any action taken would conform to social conventions or expectations.

According to this model, different emotional states (e.g. anger, fear, or joy) emerge from the cumulative process of changes in the various subsystems. The emotional process is thus seen as a continually fluctuating pattern of change brought about by the particular pattern of outputs from SECs. Consequently, emotional states are the result of multiple components each serving particular adaptive functions. Table 1 shows the relationship between the various subsystems and the components, which serve the adaptive functions of emotion. In contrast to theories emphasising the cognitive processing as antecedents of emotion, Scherer stresses that many of the appraisal mechanisms thought to be responsible for various emotional states may rely on automatic processing conducted by hard-wired or innate mechanisms.

Table 1 Relationships between organismic subsystems and the functions and components of emotion (after Scherer, 2001)

An emotional episode is conceptualised as starting with the evaluation of an event as significant to the organism. All aspects of the appraisal process are highly interrelated, with each SEC depending on the result of prior checks; the process is seen as a sequence in which the SECs are performed in a fixed order. For example, it seems logical to assume that the nature of the stimulus needs to be determined before an assessment of coping potential can be made. However, the ensuing appraisal process is not a one-shot matter but an evolving cycle triggering reappraisals which are fed back into the system continually throughout the emotional episode. Like in most appraisal theories, the emotional episode is seen as depending on the evaluation of a stimulus and requires some form of transformation or meaning extraction to be performed. Intuitively one can imagine how emotional episodes may arise in such a way, especially when the event encountered has adaptive significance. However, it may also be possible for subjective feelings to arise when the triggering event is less easy to identify (e.g. generalised anxiety).

One advantage of the multicomponent account is that it does not overestimate the need for cognitive involvement in emotional causation and acknowledges that emotions can be generated with little cognitive effort. Similarly, Parkinson (2001) shares the view that appraisal is an unfolding emotional process in which emotional meaning could be reached in an implicit way. Both accounts agree that low-level processes may be responsible for eliciting emotion and allow for minimal cognitive involvement in emotional causation.

While Scherer’s account focuses on appraisal processes within the intrapsychic arena, however, Parkinson argues that appraisal may be interpersonally distributed, with each party contributing pieces of explicit and implicit information towards the ongoing social dialogue (see also Lewis, 1993). Rather than viewing emotional processes as occurring solely within the individual, the responsibility of appraisal is taken away from the separate resources of each individual and is shared throughout the interaction. No person needs to register emotional significance of what is happening because the ultimate meaning is dictated by, and emerges from, the social process itself. Holding an interpersonally distributed emotional representation further reduces the need for cognitive assessments to be performed by a particular individual. This account introduces the possibility of emotional meaning being achieved through a shared process of appraisal in which the overall emotional impact depends on the particular characteristics of the ongoing social interaction.

While the multicomponent approach emphasises the evaluations that occur within an individual in response to a significant event, it does acknowledge that situation and contextual factors affect the evaluation process. Scherer states that the cultural and social context, the nature of the situation, and in particular the nature of the interpersonal relationships within that situation will all influence the appraisal process. Like Parkinson, Scherer shares the view that emotions are not static manifestations. Scherer stresses the dynamic nature of emotion as a process of the continually changing states of its components.

The theories presented in this section agree that attention to the ongoing nature of emotional episodes and accommodation of implicit mechanisms that feed into these processes are both important features of any workable conceptualisation of emotion processes. Both Scherer and Parkinson share the view that emotional episodes are not static manifestations resulting from high-level cognitive calculations, but are ongoing, evolving multicomponent processes. They also suggest that any intelligent system capable of interacting emotionally with humans would ideally need to understand the multicomponent, implicit/explicit nature of emotional elicitation and experience. It would also need to be sensitive to, and capable of being regulated by, the resulting, ongoing emotional interaction itself. In summary, the multicomponent description of emotional episodes suggests that it would be computationally challenging to implement virtual software agents capable of managing human-like social interaction.

2.2 Social Process Theory of Emotion and Continuous Process Model of Communication

Another model that takes the active nature of emotions into account is Fogel’s (1993; see also Fogel et al., 1992) dynamic systems approach to emotions and communication. Fogel presents a theoretical framework in which emotion is regarded not only as a multicomponent construct, but also as an ongoing process whose boundaries are not rigidly defined as single, discrete occurrences. In fact, the capacity of particular events to trigger specific emotions is said to vary in accordance with the dynamics of the emotion in relation to social context; for example, an action (e.g. bursting into a laugh) may elicit pleasant emotions on one occasion but not on another, depending on the appropriateness of the action to the situation in question (e.g. a dinner vs. a funeral), the interactants’ prior emotional state (e.g. happiness vs. anger or sadness), and the structure of the physical environment (e.g. a noisy crowd vs. a silent assembly).

The starting point of Fogel’s social process theory of emotion is to consider emotions not as states but as self-organising dynamic processes closely tied to the course of an individual’s activity in a social and physical context. From this point of view, the subcomponents of the unfolding emotional process create a self-organising system by interacting with, influencing, and imposing constraints on each other. However, none of the single subcomponents determine the others in an a priori predictable way, nor are any of these subcomponents individually sufficient to cause emotion, or fully delineated and shaped in the absence of interaction. Moreover, integration and coordination of the emotional components do not necessarily depend on an internal system, but can instead arise from the co-regulated interchange between persons and persons and environment (bottom-up, rather than top-down control processes). Such co-regulated transaction, in turn, can happen at both an explicit and an implicit level.

Fogel (1993) explains co-regulation as “a social process by which individuals dynamically alter their actions with respect to the ongoing and anticipated actions of their partners” (p. 34). In this sense, the outcomes of co-regulated interactions are not results of either an explicit plan or scheme inside one individual, or of an exchange of messages produced by discrete communication signals, but instead emerge from the dynamics of the interaction and the constraints on the communication system.

Unlike discrete state communication systems models, Fogel’s theory does not allocate the distinct roles of sender and receiver to interactants engaged in information exchange. Moreover, considering communication as a dynamic process, it is hard to isolate specific signals carrying definite messages. In everyday communication, the same information can be understood in different ways by distinct interactants and can also have different meanings to the same person in different situations.

In emotional communication, in particular, interactants are constantly exchanging explicit and implicit information, modifying each other’s emotional responses as they occur. Teasing a friend, for instance, may have different outcomes on our emotions depending on our friend’s reactions. If the reaction is laughter, our enjoyment may be increased, but if the response is an expression of disapproval or sadness, our initial amusement may turn to guilt. More generally, during ordinary social interactions one person continuously gives feedback to another, who in turn reacts adjusting his or her own flow of actions, thus creating a consensual, negotiated interactional frame. Emotions, therefore, can hardly be defined as discrete units within the individual; they develop in a shared social context and as such are continuous and never static.

3 Regulating Processes in a Dynamic Emotional World

The flow of interpersonal exchange during social interaction is often regulated by both explicit and implicit processes. Correspondingly, our dynamic emotional experience can be simultaneously influenced by both top-down and bottom-up systems. In this section, we focus on synchrony , dissynchrony , mimicry , and emotional contagion as examples of such phenomena, before using the example of video-mediated interactions to illustrate how the different components of affect communication can interact.

3.1 Synchrony and Dissynchrony

Condon and Ogston (1966) first introduced the concept of behavioural entrainment, or synchrony, to describe an individual’s adjustment of behaviour to coordinate with the rhythms of his or her interactants. Analysing patterns of change within and between the behaviour of individuals, these authors found that in normal behavioural patterns harmonious configurations of change between body movements and speech arise, both at an intra-individual (self-synchrony) and at an interactional (interactional synchrony) level.

Bernieri et al. (1988) characterised interactional synchrony as involving (a) direct imitation or mirroring of others’ movements, affects, and attitudes; (b) congruence between behavioural cycles of two or more people; and (c) perception of a new meaningful, coordinated whole event created by the unification of concurrent behavioural factors. Although this synchronisation of behaviour is observable in principle, it is not usually attended to. One can perceive another person’s engagement or disengagement without explicitly knowing which aspects of his or her behaviour triggered such awareness. Several authors (e.g. Coy, 2001; Wallbott, 1995) consider synchrony, along with motor mimicry and emotional contagion, as a mechanism that helps people to reach interpersonal mutuality in an immediate and unconscious way.

Despite the emphasis that is usually put on implicit regulation of behavioural synchronisation in social interactions, some authors have called attention to the role of more overt acts of information transmission. In a study on the development of mutuality and the subsequent achievement of intersubjectivity, Tronick et al. (1977) suggest that interactants may modify their affective and attentional displays to match or clash with those of their interaction partners, in order to communicate more or less desire to be involved in the specific social interaction. In this case, synchrony would be a way to communicate interest and approval (see also Kendon, 1970), while dissynchrony, the opposite effect, would be a means of interrupting or modifying the current interaction (cf. Tiedens and Fragale, 2003). Support for this communicative interpretation of synchrony and dissynchrony comes from a study by Bernieri and colleagues (1988). These authors point out that low levels of rated synchrony between interactants do not always reflect an absence of synchrony (i.e. asynchrony), but may instead depend upon affects and actions being out of phase as a result of one or more interaction partner’s deliberate mismatching.

Behavioural entrainment, therefore, can be seen as influenced by both automatic, unconscious processes and explicit communicative actions deliberately intended to modify the emotional interaction. While communicating, individuals can instinctively adjust to each other’s movements and affect reaching a high degree of engagement; yet, during the same interaction, one of the partners may intentionally try to redefine or recalibrate the frame of the communication by varying the degree of synchrony with the other. Again, this intentional regulation of entrainment might be registered implicitly or explicitly by the other interactant, leading to different consequences on the unfolding emotional interaction.

3.2 Imitation, Mimicry, and the Chameleon Effect

While synchrony involves interactants jointly constructing a coordinated pattern of behaviours using a kind of bodily dialogue, mimicry refers more specifically to the direct imitation of another’s behaviour. At times, we might mimic someone on purpose to ingratiate ourselves to them. For example, training courses for salespeople (and in relationship skills more generally) often explicitly encourage the use of mimicry and imitation as tools for creating smoother interactions and for enhancing interpersonal impressions. Further, imitation may be used to draw other people’s attention to specific features of their behaviour, thereby recalibrating the ongoing interaction.

However, imitation effects often seem to be unintentional. Chartrand and Bargh (1999) refer to the non-conscious mimicry of facial expressions, speech patterns, postures, mannerism, and other behaviours of one’s interaction partner as the “chameleon effect”. These authors propose that perception and interpretation of another person’s behaviour automatically activate corresponding behavioural representations in the self, which in turn increase our own tendency to behave in a congruent manner (perception–behaviour link). Thus, mimicry can be the involuntary behavioural consequence of perceiving another’s behaviour; moreover, perception of a similar behaviour by the other strengthens the interaction, creating feelings of empathy. In an interesting study, Bailenson and Yee (2005) report unconscious mimicry and a subsequent increase in rapport even when people interact with an embodied artificial intelligence agent. Mimicking agents were viewed by participants as more persuasive and pleasant than non-mimicking ones, even though participants apparently failed to notice mimicry at an explicit level.

Investigators have also studied the influence of mood and emotions on mimicry. For example, Van Baaren et al. (2006) found that good moods lead people to greater automatic imitation of other people’s behaviour than bad moods. The authors account for these findings by reference to the contrasting informational implications of positive and negative affect (e.g. Schwarz, 1990) as safety and danger signals, respectively. Thus, good moods lead people to process information in a more holistic and spontaneous way whereas bad moods lead people to process information more analytically (e.g. Mackie and Worth, 1991; Schwarz and Bless, 1991). Similarly, good moods may lead people to be less reflective and more spontaneous in regulating their behaviour leaving them more susceptible to automatic influences.

Niedenthal et al. (2001), on the other hand, explored the role of mimicry in understanding facial behaviour in emotional interactions. Their work has found that individuals in a particular emotional state detect changes in another’s emotion expression better if this expression is congruent with their own emotion than if it is incongruent. This seems to be due to the fact that people mimic emotion-congruent expressions more easily than emotion-incongruent ones. As a result, individuals detect changes in an emotion­congruent expression because these changes produce a noticeable alteration in their own facial behaviour.

Finally, mimicry, as well as synchrony, seems to have an impact on the development of shared emotions during interactions, a phenomenon known as emotional contagion. This phenomenon will be discussed in the next section.

3.3 Emotional Contagion

The pervasive automatic tendency towards mimicry and behavioural coordination may generate emotional episodes by inducing a corresponding emotional state in the mimicker. Hatfield and colleagues (1994) coined the term primitive emotional contagion to describe the process whereby individuals catch the emotions of those around them. While emotional contagion is sometimes thought to involve conscious perceptions and evaluations, Hsee et al. (1992) argue that “generally the process by which people feel others’ emotional states is fairly non-conscious primitive, and automatic” (p. 2). During the course of social interaction the natural tendency to mimic others results in a synchrony of facial, postural, and vocal expressions. The synchrony of affective behaviours manifests in a shared emotional state among the interactants. Specifically, subjective emotional states are affected moment to moment by the activation and/or feedback from such mimicry. Emotional states may be influenced by either central nervous system commands which direct mimicry, afferent feedback from facial, verbal, and postural movements, or conscious attribution of affect based on self-perception of expressive behaviour. Emotional contagion is hence conceptualised as a two-stage process in which mimicry and the activation of feedback from mimicry result in a corresponding subjective emotional state among the interactants. As other theorists have also argued (e.g. Öhman, 1988; Posner and Snyder, 1975; Shiffrin and Schneider, 1977), Hatfield et al. (1994) propose that much of the processing of emotional information occurs outside of conversant awareness; the theory draws on the subtlety with which this implicit information affects behaviour. From this perspective, subjective emotional experience is heavily influenced by the subconscious monitoring and reaction to implicit emotional information presented throughout the social dialogue. While Hatfield and colleagues maintain that the emotional contagion effect is predominantly the result of automatic responses occurring outside awareness, it is likely that explicit processes also regulate contagion mechanisms.

Presumably emotional contagion is a reciprocal process in which one person’s emotions affect another’s and vice versa. If contagion were to operate at an entirely unconscious and automatic level during social interaction, one would expect to see emotions rapidly intensifying between people and potentially escalating to extreme levels. For example one person’s fear would induce a corresponding emotional state in the other, whose reaction would in turn influence the first person’s, and so on; in this scenario it would not take too long for a state of emotional hysteria to develop. While Hatfield et al. (1994) cite several examples throughout history where this has in fact happened (e.g. the dancing manias of the Middle Ages, the great fear of 1789, and the New York City riots of 1863), instances of hysterical contagion of this kind are relatively few and far between. It is likely that conscious or unconscious control processes often override any natural tendency to mimic and regulate contagion.

Rather than being purely a primitive reflex-type reaction, it seems highly possible that mimicry is sometimes employed as an adaptive communication tool and as such is influenced by the social context. For example, the nature of the relationship between interacting individuals can either facilitate or thwart mimicry. Hatfield et al. (1994) argued that liking and closeness encourage mimicry, although there is no clear evidence concerning whether emotional contagion results from liking, or liking increases mimicry which in turn causes emotional convergence (or both). However, Bucy and Bradley (2004) have shown that counter-mimicry and counter-empathetic emotional responses are evoked by emotional expressions deemed inappropriate to the situation. This would suggest that the mechanisms underlying mimicry are subject to some form of assessment in relation to the social context.

Perhaps the least convincing aspect of the emotional contagion hypothesis is the postulated role of autonomic feedback from facial/postural movements in the overall emotional experience. Available evidence in fact suggests that effect sizes from facial feedback are likely to be small (e.g. Tourangeau and Ellsworth, 1979). It is questionable whether sensory feedback is in fact the main mechanism by which emotional convergence occurs. It is more likely that individuals make appraisals of other people’s emotional reactions in order to make sense of the situation and ultimately to decide how they will respond emotionally (e.g. social referencing, Sorce et al., 1985). Thus, emotions are partly determined by interpersonal rather than internally generated feedback. In sum the mechanisms postulated are likely to involve the interaction of both explicit and implicit processes which create and regulate contagion effects.

3.4 Video-Mediated Interactions

Before introducing a general comparison between emotional-competent humans and emotional-competent artificial agents in the next section, we want to consider video-mediated communication as a specific kind of non-face-to-face interaction between humans which can show the implications of explicit and implicit signals, and lack of them, during emotional encounters (see also Parkinson, 2008).

In a study on the impact of audio-visual technology on informal communication in workplaces, Heath and Luff (1992) pointed out how some features of such technology may transform the impact of visual and vocal conduct and introduce asymmetries in interpersonal interactions. In particular, face-to-face interactants may respond automatically to cues arising from gaze direction or physical gestures that are only registered in peripheral vision, whereas users communicating via video-conferencing may completely fail to pick up these signals. This might be due, for example, to a two-dimensional vs. three-dimensional representation of faces, to the physical arrangement of screens and cameras, or to the size and flatness of the monitor. In any case, the result is an unbalanced coordination of the interaction with a restricted possibility for regulation of the communication, even when explicit attempts to adjust the conversation are made by one of the interactants.

Parkinson and Lea (in press) specifically investigated the impact of the constraints associated to video-mediated communication on transmission and regulation of emotions. Introducing transmission delays in a video-conferencing system, the authors showed that such gaps in the flow of the conversation can interfere with the establishment of rapport between individuals. The reason for this might be found in a lack of temporal synchrony or of temporal complementarity (i.e. limitation in immediate feedback).

Limitations in communication technology such as videos and cameras can therefore decrease shared positive affect and may even lead to the development of negative emotions such as frustration between interactants. As illustrated in the next section, this kind of reaction can also apply when humans are interacting with artificial agents whose emotional interactional skills are often limited to recognition of full-blown, explicitly categorised emotions.

4 Comparison Between Humans and Artificial Agents in Emotional Interactions

The required level of sophistication of an agent’s affective architecture depends directly on the purpose of its design. That is, not every application needs to provide an elaborate veridical simulation of the emotional aspects of human behaviour. However, if software entities such as ECAs are intended to permit face-to-face interaction with human beings, the underlying components and processes of their affective architecture are likely to require sufficiently elaborated behavioural and expressive mechanisms to allow believable engagement with humans (Pelachaud, 2006). Despite recent significant developments, most affective ECAs can only occasionally fulfil their purposes without inciting abusive behaviour or negative emotions from users. The breakdown of user empathy towards an artificial entity caused by the lack of human-like appearance, emotional expression, and motion displays during an interaction can contribute both to user dissatisfaction and reluctance to engage in further communicative attempts (Angeli and Carpenter, 2005).

In the context of human–computer (HC) interaction, agent technologies clearly require additional development, especially in terms of better integrating cognitive and emotional features that can appropriately resemble at least a minimum set of adaptive features found in human social behaviour (Ventura et al., 2005). These include, for example, implicit and explicit regulatory responses as described earlier in this chapter together with ECA design recommendations discussed in chapters “Coordinating the Generation of Signs in Multiple Modalities in an Affective Agent” and “Generating Listening Behaviour” of Part IV. Good practice obviously requires that the escalation of user frustration while interacting with artificial entities is either actively avoided or quickly corrected. However, to date, most of the available agent affective architectures provide at best poor explicit and efficient means to automatically deal with or learn from these situations (Barkhuysen et al., 2005). If emotional interactions with artificial agents are to be truly comparable to those with humans, it is necessary to overcome various computational challenges that currently undermine positive perceptions of the technologies used to compose artificial social entities. In order to enable more successful synthesis of agent behaviour, research endeavours will probably involve not only the enhancement of architectural processes and data structures, but also more realistic adaptation of emotional facial expressions and other multimodal communication capabilities such as the capability to process natural language via sound or text interfaces. Examples include agents that simultaneously mimic human facial expressions accompanied with gestures (Caridakis et al., 2007), and guidance protocols for collection of multimodal emotional data between human–human interactions (Zara et al., 2007).

Increasing the anthropomorphism of the agent interface is known to affect the user’s experience and its evaluation. This is mainly due to the resulting higher user expectations regarding agent believability and possible intelligent traits (Fong et al., 2002). Numerous examples could be cited of how subtle visual or behavioural imperfections can transform positive perceptions of artificial entities into generally uncomfortable experiences that are sustained by cognitive dissonance (Masahiro, 2005). For example users sometimes report unsettling experiences whilst interacting with real android-like robots, usually triggered by incongruous behaviour. On the other hand, if an entity is clearly unrecognisable as having human-like behaviour or expressivity, even simpler traits resembling acceptable cognitive-affective responses are more likely to be noticed and may stimulate empathy and social engagement from the users. Animated cartoon characters are probably the best example of the latter observation. Therefore, pragmatic understanding of the impact of graphical agents and their internal processing structures are directly relevant to the design of ECAs. Nevertheless, the precise way of addressing such issues is not always completely clear, as it depends heavily on the application purpose (and also sometimes on good understanding of the cultural context in question).

Several different artificial intelligence approaches are currently available to model affective agents, but most usually include agent architectures focused only on symbolic (logic-based) or sub-symbolic (connectionist) data representations and processing mechanisms. Each has its specific strengths and, although a limited number of hybrid systems exist, there is no consensus on how to integrate such benefits in a single computational system (Schröder and Cowie, 2006). A number of authors have proposed similar component requirements for affective architectures in order to create entities capable of successful human–agent emotional interactions. Examples include appraisal and interpretation mechanisms influenced by action tendencies, management of emotional personality profiles for processing contextual human information, believable interaction capabilities, and supervised or unsupervised machine-learning techniques that can take advantage of past interaction experiences (Payr, 2001). To facilitate design, it is necessary to take into account the expected functionalities of an agent and the ways in which the user interface might assist the interaction based on their context-dependent requirements.

The ability to communicate emotions is often regarded as crucial for the usability of socially interactive agents such as affective ECAs and certain types of robots, because users will probably prefer, and be more accepting of, entities that can provide a sense of comfort and usefulness during their interactive sessions (Masahiro, 2005). Emotions need not be visible in order for a system to fulfil its designated purposes; for example, an entertaining entity will probably have a different design and different operational principles than a software entity intended to improve decision-making. In this sense, clear specification and analysis of relevant user-centred metrics can help to understand the impact of different systems and facilitate their subsequent comparison. These metrics may include, for instance, rating criteria for classifying interactions that are aimed at conveying emotions in real time during the generation of speech and facial expressions, attention and appraisal skills, degrees of freedom for adapting body postures (e.g. gazes and gestures), social competence (e.g. conformity to social norms), and suggestive use of colours on the interface available to the human user. Conversational agents processing text for assisting the accomplishment of specific tasks pose different design and implementation issues than those that, for instance, would sense physical proximity in game-like educational software. Social coping mechanisms are vital as data collected during social interactions are often incomplete or ambiguous, and the handling of such events can influence user perceptions of how much their task was ameliorated by the agent’s performance. Recommendations to ECA designers depend heavily on which HC interfaces are available to implement architectures that will interact with users or help them to execute their tasks. It is important to take the extent to which users will rely on ready-to-use interactive features into account (i.e. the relevance of text, sound or image expressive modalities). Whilst user interface preferences vary according to individual criteria, design effectiveness can improve substantially, simply by focusing on concrete contributions to the user experience with regard to what the system ought to facilitate.

5 Conclusion

In our everyday face-to-face encounters, as well as in any video-mediated conversations or dealings with artificial agents, the way we respond to each other, whether consciously or unconsciously, is essential to the creation of understanding and rapport, which in turn can increase our enthusiasm for continuing the interaction. Throughout this chapter we have illustrated how emotions cannot be reduced to fixed entities, and how emotional encounters are based on ongoing processes that are typically embedded in a dynamic social context. Moreover, we have seen how both explicit responses to emotions (i.e. conscious processing and categorisation of emotional information) and implicit ones (i.e. registration of emotion response components) can influence the unfolding interpersonal interactions. In order to better understand emotional communication and to develop agents that can naturally interact with humans, we conclude that attention should be focused on the development of dynamic models of social interaction that do not consider emotions as static units but as multicomponent constructs that are continuously regulated by the interaction itself and by the social and cultural context in which it unfolds.