Keywords

2.1 Introduction

The human ability to merge information from different sensory systems offers more accurate and faster competencies to operate in response to the environmental stimuli [1]. The integration of different signals in an unique percept is especially appreciated in noisy environments with corrupted and degraded signals [20]. Research in neuroscience had proved that audiovisual, visual-tactile and audio somatic sensory inputs are constantly synchronized and combined into a reasoned percept [3, 33]. For example, in speech, the influence of visual on auditory signals perception is proved by the McGurk effect [28]. Recent investigations on the emotional labelling of only audio, mute video, and combined audio/video stimuli proved that the human processing of emotional information is strongly affected by the context to the extent that, depending on the participant’ language, culture, and her knowledge of a given foreign language, her performance on an emotion recognition task, accuracy may become significantly worse when the visual signal (emotional facial expressions) is added to audio (emotional vocal expressions, [9, 1113, 16].

Facial expressions, head, body and arms movements (grouped under the name of gestures) all potentially provide information to the communicative act, supporting the interactional exchange and allowing interactants to add a rich variety of contextual information to their messages including (but not limited to) their psychological state, attitude. Gestures have been shown to vary in size, redundancy, and complexity, depending on the grounding status of the information encoded and/or the meaning attributed to the message. Gestures act in partnership with speech, building up shared knowledge and meanings when the interactional exchange is successful [10, 15, 23].

Psycholinguistic studies have confirmed the partnership nature of verbal and nonverbal signals in human interaction demonstrating that the understanding of a message results from the integration of multi-sensory features appropriately distributed along the interaction [30].

In addition, it has been suggested that interactive communication is emotionally driven [8] and that the encoding and decoding procedures exploited by humans to express and/or read emotions are fundamental to secure the social quality and the cognitive functioning of a successful interactional exchange.

Another crucial aspect of multimodal communication is the relationship between paralinguistic and extra-linguistic information (such as speech pauses, head nodding). Psycholinguistic studies have shown that there exists a set of non-lexical expressions carrying specific communicative values (expressing for example turn-taking and feedback mechanism regulations) such as empty and filled pauses and other hesitation phenomena [14, 15, 17]. It has been shown that pauses (holds) in gestures plays similar communicative values and synchronize with paralinguistic information [10].

It can be concluded that the verbal and nonverbal communication modes jointly cooperate in assigning semantic and pragmatic contents to the conveyed message by unraveling the participants’ cognitive and emotional states and exploiting this information to tailor the interactional process. These modes exploit multimodal social signals and are tailored to the contextual instance in which they are explicated.

The huge demand for complex autonomous systems able to assist people on several needs had produced a consistent number of EU and overseas funded projects such as (a) ERICA (www.jst.go.jp/erato/ishiguro/en/) a conversational android with human appearance aiming to interact with humans through multimodal social signals such as face, speech, and body movements; (b) MUMMER a humanoid robot (based on Aldebaran’s Pepper platform, http://www.dcs.gla.ac.uk/vincia/?page_id=116) engaging people in dynamic environments, such as shopping malls; (c) TESLA—An Adaptive Trust-based e-assessment System for Learning (www.open.ac.uk/iet/main/research-innovation/research-projects/adaptive-trust-based-e-assessment-system-learning-tesla); (d) ALIZ-E (http://www.aliz-e.org/) engaging diabetic children in a series of real-world situations (partially) WOZ-simulated implemented using the NAO robot (http://www.aldebaran-robotics.com/en) [25, 26]; and several more, all exemplifying the huge research efforts in implementing socially believable assistive technologies. However, these projects have been characterized by a scarce care to what effectively would have been end-users’ requirements and expectations to qualify as “socially behaving” agents providing “social/physical/psychological/assistive ICT services. In particular, the term “social robotics” envisions a “natural” interaction of such devices with humans, where “natural” is interpreted as the ability of such agents to enter the social and communicative space ordinarily occupied by living creatures” [7, p. 2]. In addition really few has been made to account for “how the interaction between the sensory-motor systems and the inhabited environment (that includes people as well as objects) dynamically affects/enhances human reactions/actions, social perception and meaning-making practices” [6, p. 6].

To account for the above mentioned problems it is needed an all-embracing prospect pushing the designer to contemplate the system’s behavior and appearance (in order for the system to be social), the trustworthiness the user put into it (in order to be emphatic) taking into account the contextual instance (the scenario) and the system’ functionalities required in each situation, as well as, the individual’s social rules and cognitive competencies [5].

In the field of Human Robot Interaction, such an approach, will require investigations on the cognitive architectures and cognitive integrations needed for accounting of human behavior across different domains, and inherently of the behavior humans engage with a system that, as much as complex and autonomous can be, can offer only a sub-optimal interaction process (see key activities of the topic group Natural Interaction with Social Robots http://homepages.stca.herts.ac.uk/~comqkd/TG-NaturalInteractionWithSocialRobots.html, [6, 7, 19]).

To date there have been relatively few efforts assessing human interactional exchanges in context in order to develop complex autonomous systems able to detect user’s trust and mood, rise emphatic feeling, and take actions to provide help. In addition there are no standards for the development of more ‘satisfying’ complex autonomous systems that account for user’s expectations and requirements in a structured manner. Although there have been efforts in providing suggestions for potential solutions [2, 18, 32] this issue is at a research stage [7, 19].

Generally the development and assessment of complex autonomous systems is tackled using two different approaches: user’s self reports and performance based measures. The user’s self reports can be highly criticized because of the user’s difficulty to accurately describe her/his expectations and requirements, being them generally technologically naïve and suspicious. Performance-based measures can be considered more reliable since they require the execution of specific tasks assessed by a trained evaluator. Nevertheless, these tasks are generally carried out under artificial conditions and require extensive equipment, well defined environmental context and time consuming evaluation procedures that do not value the daily spontaneous activity producing biases in the collected measures. Despite its importance, generally these systems are unable to being context-aware, to adapt to user’s preferences and very distinct needs and to correctly interpret all user’s actions.

The research papers proposed in this book investigate the features that are at the core of human interactions and provide attempts to model the cognitive and emotional processes involved in order to design and develop complex autonomous system prototypes able to simulate the human’s ability to decode and encode social cues while interacting.

2.2 Content of the Book

The research objectives proposed in this book can be interpreted as a meta-methodology aimed to investigate features that are at the core of human interactional exchanges and model the cognitive and emotional processes involved in interactions, in order to design and develop complex autonomous system prototypes able to socially behave in (at the least) specific scenarios. The attention is focused on the analysis and modeling of social behavioral features and human ability to decode and encode social cues while interacting. Behavioral data (speaking, body movements, facial, vocal and gestural emotional expressions) are gathered from healthy and communicative or socially impaired participants. This require the definition of behavioral tasks that serve both to detect changes in the healthy, as well as, impaired perception of social cues. Specific scenarios are proposed for these tasks intended to assess the users’ attitude, acceptance, and trustworthiness toward a robotic system considering its emphatic and social competencies, as well as appearance. The collected data are used to gain knowledge on how behavioral and interactional features are affected by individual characteristics and personalities, contextual instances, and environmental perceptual features. Hopefully, these investigations will guide on which humanlike social characteristics and appearance (physically embodied or virtual intelligent agent?) a complex autonomous system should exhibits to gain the users’ trust and acceptance as a socially behaving agent.

To this aim, the book includes nine investigations on the mathematical modeling of social signals and context embedded in interactional exchanges. The second chapter by Maldonato and Dell’Orco [27] affords one of the most debated issue in artificial intelligence: “the possibility of reproducing in an artificial agent (based on formal algorithms) some typically human capacities (based on natural logic algorithms) such as consciousness, the ability to deliberate and make moral judgments” [p. 1]. The authors are very desecrators to the point of asserting that “clarifying consciousness mechanisms of artificial organisms could help us to discover what we still ignore about neurobiological consciousness” [p. 2]. This extreme in the quest for equipping machines with human level automaton intelligence, consciousness, and intuition, leave us with the question on whether we really want conscious and intuitive artificial agents. The answer is given by the contribution of Gnjatović and Borovac [21] which propose the implementation of conscious-like conversational agents, implicitly answering to the question with a discussion on which are the limitation for implementing consciousness features in mathematical prototypes. These investigations clearly suggest that the aims of the research on social robotics is to implement natural interactions with such social agents. The contribution of Vogel [35] covers one aspect of this sociability proposing an investigation aimed to the “understanding of natural dialogue” in order to “fully inform the construction of believable artificial systems that are intended to engage in dialogue with a manner close to human interaction in dialogue” [p. 1]. Harrington et al. [22] follow a similar vision considering “the relevance of context and experience for the operation of historical sound changes” [p. 1]. The contribution of Clavel et al. [4] is an original survey on which competencies an artificial agent must exploit to maintain, when interacting, users’ engagement. The focus is on both users’ attentional and emotional involvement. op den Akker’s et al. [31] contribution propose “Kristina, a personal digital coaching system built to support and motivate users to live a balanced and healthy lifestyle” [p. 14]. The authors are aware of dangers and objections that can be raised by modeling interactional persuasive features and provide a very interesting discussion on these aspects. The contribution of Vinciarelli [34] discuss on how “endowing machines with social perception”, in particular “by providing a simple conceptual model of social perception and by showing a few examples related to Automatic Personality Perception, the task of predicting how people perceive the personality of others” [p. 1]. Finally, the last two contributions surpass dyadic interactional features considering to model either multimodal and multiparty interactions in educational settings, as in the work of Koutsombogera et al. [24], or to detect abnormal behavioral patterns in crowd scenarios, as in the contribution of Mousavi et al. [29].

2.3 Conclusions

The readers of this book will get a taste of the major research areas in modeling social signals and contextual instances of interactional exchanges in different scenarios for implementing robotic socially believable behaving systems. This research should result in a series of theoretical and practical advances in the field of cognitive, and social psychology such as: (1) Repertories of social signals better illustrating the cognitive, semantic, emotional and semiotic mechanisms essential for successful interactional human-machine exchanges; (2) Models for representing data, reasoning, learning, planning, and decision making, as well as, individual/group behavior analysis models in multilingual and cross-cultural contexts; (3) The identification of new interactional persuasive and affective strategies and contextual instances calling for their use. Considering technological issues, the present research must lead to (1) New computational approaches and departures from existing cognitive frameworks and existing algorithmic solutions such as dynamic Bayesian networks, long short-term memory networks, and fuzzy models of computation; (2) The implementation of behaving ICT systems of public utility and profitable for a living technology that simplifies user access to future, remote and nearby social services encompassing language barriers and cultural specificity; (3) Market applications such as: context-aware avatars replacing human in high risk tasks, companion agents for elderly and impaired people, socially believable robots interacting with humans in extreme, stressful time-critical conditions, future smart environments, ambient assistive living technologies, computational intelligence in games/storytelling, embodied conversational avatars, and automatic healthcare and education services.