Keywords

1 Introduction

The term “presence” entered in the wide scientific debate in 1992 when Sheridan and Furness used it in the title of a new journal dedicated to the study of virtual reality systems and teleoperations (Coelho et al. 2006): Presence, Teleoperators and Virtual Environments. In the first issue of the journal, Sheridan (1992): describes presence as “the effect felt when controlling real world objects remotely” as well as “the effect people feel when they interact with and immerse themselves in virtual environments” (pp. 123–124).

Following this approach, the term “presence” has been used to describe a widely reported sensation experienced during the use of virtual reality. However, as commented by Biocca (1997), and agreed by most researchers in the area, “while the design of virtual reality technology has brought the theoretical issue of presence to the fore, few theorists argue that the experience of presence suddenly emerged with the arrival of virtual reality.” Rather, as suggested by Loomis (Loomis 1992), presence may be described as a basic state of consciousness: the attribution of sensation to some distal stimulus, or more broadly to some environment.

Due to the complexity of the topic, and the interest in this concept, different attempts to define presence and to explain its role are available in the literature. In general, as underlined by Lombard and Jones (2006): “the first and most basic distinction among definitions of presence concerns the issue of technology.” (p. 25).

One group of researchers describe the sense of presence as “Media Presence”, a function of our experience of a given medium (IJsselsteijn et al. 2000; Lombard and Ditton 1997; Loomis 1992; Marsh et al. 2001; Sadowski and Stanney 2002; Schloerb 1995; Sheridan 1992, 1996). The main result of this approach are the definitions of presence such as the “perceptual illusion of non-mediation” (Lombard and Ditton 1997) produced by means of the disappearance of the medium from the conscious attention of the subject. The main advantage of this approach is its predictive value: the level of presence is reduced by the experience of mediation during the action. The main limitation of this vision is what is not said (Waterworth et al. 2012). What is presence for? Is it a specific cognitive process? What is its role in our daily experience? It is important to note that these questions are unanswered even for the relationship between presence and media. As underlined by Lee (2004b) “Presence scholars, may find it surprising and even disturbing that there have been limited attempts to explain the fundamental reason why human beings can feel presence when they use media and/or simulation technologies.” (p. 496).

To answer to these questions, a second group of researchers considers presence as “Inner Presence”, a broad psychological phenomenon, not necessarily linked to the experience of a medium, whose goal is the control of the individual and social activity (Baños et al. 1999, 2000; Lee 2004a, b; Mantovani and Riva 1999; Marsh et al. 2001; Moore et al. 2002; Riva 2011; Riva and Davide 2001; Riva et al. 2003a, b, 2014; Riva and Mantovani 2012a, b; Riva and Waterworth 2014; Schubert et al. 2001; Spagnolli and Gamberini 2002; Spagnolli et al. 2003; Waterworth and Waterworth 2001, 2003; Waterworth and Waterworth et al. 2012; Zahoric and Jenison 1998). In this chapter we support this second vision, starting from the following broad statements:

  • The psychology of presence is related to human action and its organization in the environment (Mantovani and Riva 1999; Marsh 2003; Riva et al. 2003a). As suggested by Zahoric and Jenison (1998), “Presence is tantamount to successfully supported action in the environment… Successfully supported action in the environment is a necessary and sufficient condition for presence.” (pp. 79–80).

  • The psychology of presence is related to the body and to the embodiment process (Biocca 1997; Biocca and Nowak 2001; Riva 2006; Riva et al. 2014). As expressed by Biocca (1997) “before paper, wires, and silicon, the primordial communication medium is the body. At the center of all communication rests the body, the fleshy gateway to the mind… Thinking of the body as an information channel, a display device, or a communication device, we emerge with the metaphor of the body as a kind of simulator for the mind.” (Online: http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.1997.tb00070.x/full).

  • presence is an evolved process related to the understanding and management of the causal texture of both the physical and social worlds (Lee 2004a, b; Riva and Waterworth 2014). As underlined by Lee “the knowledge of the causal texture of both the physical and social worlds should be innate, or at least developed very rapidly after birth (probably within the first 3 or 4 years). The lack of innate or very rapidly acquired knowledge of the causal structure of both the physical and social worlds poses an enormous survival threat to humans” (p. 498).

In this chapter we attempt to provide a more elaborate – and probably controversial – account of the fundamental presence-enabling mechanisms. Recent research in neuroscience has tried to understand human action from two different but converging perspectives: the cognitive and the volitional. On one side, cognitive studies analyze how action is planned and controlled in response to environmental conditions. On the other side, volitional studies analyze how action is planned and controlled by the subject’s needs, motives and goals. In this chapter we suggest that presence is the missing link between these two approaches. Specifically, we consider presence as a neuropsychological phenomenon, evolved from the interplay of our biological and cultural inheritance, whose goal is the control of agency and social interaction through the unconscious separation of both “internal” and “external”, and “self” and “other” (Inghilleri et al. 2015; Riva 2007, 2009; Riva et al. 2014).

2 The Theoretical Background

2.1 Evolution and Presence

Several recent authors, perhaps most influentially the neurologist Antonio Damasio, the philosopher Daniel Dennett, and the cognitive psychologist Steven Pinker, discuss in detail how human psychological characteristics, including emotional responses to various situations, have come to be shaped by evolutionary forces. An integral part of this contemporary psychological stance is to assume that the mind is not (in most respects) a computer-like disembodied processor of information. Rather, the modern mind reflects the evolutionary history of humankind, of long heritage of embodied organisms striving to survive in competitive physical environments.

According to Bereczkei (2000), the evolutionary approach to psychological phenomena entails recognizing certain features of human behavior that have been designed by natural selection to be useful for survival and reproduction in the environments and situations in which humankind evolved. Using this approach, we can explain a wide variety of seemingly different behaviors and support a new kind of understanding of human nature. Within this vision, an evolved psychological mechanism can be described (Buss 1995) as a set of processes inside an organism that:

  • Exists in the form it does because it (or other mechanisms that reliably produce it) solved a specific problem of individual survival or reproduction recurrently over human evolutionary history.

  • Takes only certain classes of information or input, where input can be (a) either external or internal, or (b) actively extracted from the environment or passively received from the environment, and (c) where the input specifies to the organism the particular adaptive problem it is facing

  • Transforms that information into output through a procedure (e.g., a decision rule) in which output (a) regulates physiological activity, provides information to other psychological mechanisms, or produces manifest action and (b) solves a particular adaptive problem.

If many researchers have no problem in accepting that some key psychological features are the result of some evolutionary process, most are less ready to accept the application of the same approach to presence (Biocca 1992; Lee 2004b). As suggested by Crook (1980), humans evolved specific psychic processes, defined as awareness of the external world and awareness of one's own internal state. The symbolic representations of the external world and of individuals themselves were formalized by means of descriptions and behavioral rules stored in the individual’s central nervous system (intrasomatic level) and in material tools, books, and artistic and religious artifacts (extrasomatic level).

Within this vision, we suggest that the ability to feel “present” in a virtual reality system – an artifact – basically does not differ from the ability to feel “present” in the real world. One of the main ideas expressed in this chapter is the link between presence and its evolutionary role. In more detail, we suppose that presence is an evolved psychological mechanism, created by the evolution of the central nervous system, whose goal is the enaction of the volition of the subject.

Varela and colleagues (1991) define “enaction” in terms of two intertwined and reciprocal factors: first, the historical transformations which generate emergent regularities in the actor’s embodiment; second, the influence of an actor’s embodiment in determining the trajectory of behaviors. As suggested by Whitaker (1995) these two aspects reflect two different usages of the English verb “enact”. On one side is “to enact” in the sense of “to specify, to legislate, to bring forth something new and determining of the future”, as in a government enacting a new law. On the other side is “to enact” in the sense of “to portray, to bring forth something already given and determinant of the present”, as in a stage actor enacting a role (online: http://www.enolagaia.com/RW-ACM95-Main.html).

In line with these two meanings, presence has a dual role:

  • First, presence “locates” the self in an external physical and/or cultural space: the Self is “present” in a space if he/she can act in it

  • Second, presence provides feedback to the Self about the status of its activity: the Self perceives the variations in presence and tunes its activity accordingly.

In the following paragraphs we will flesh out these claims.

2.2 Embodied Cognition: Linking Action and Perception

The Embodied Cognition paradigm takes as its starting point the idea that cognition occurs in specific environments, and for specific ends (Clark 1997, 2001; Haugeland 1998). Moreover, the Embodied Cognition approach underlines the central role of the body in shaping the mind (Clark 2001, 2003; Gallagher 2005; Gallese and Lakoff 2005; Garbarini and Adenzato 2004; Lakoff and Johnson 1980; Ziemke 2003). Specifically, the mind has to be understood in the context of its relationship to a physical body that interacts with the world. Hence human cognition, rather than being centralized, abstract, and sharply distinct from peripheral input and output modules, has instead deep roots in sensorimotor processing. This approach has been applied to the design of interactive systems in recent years, under the rubric of Experiential Design (e.g. Waterworth et al. 2003).

An emerging trend within embodied cognition is the analysis of the link between action and perception. According to this approach, action and perception are more closely linked than has traditionally been assumed: perception is a means to action and action is a means to perception. Specifically, for the Common Coding Theory (Hommel et al. 2001), the cognitive representations for perceived events (perception) and intended or to-be generated events (action) are formed by a common representational domain: actions are coded in terms of the perceivable effects they should generate. For this reason, when an effect is intended (intention), the movement that produces this effect as perceptual input is automatically activated, because actions and their effects are stored in a common representational domain.

This theory has received strong empirical support from neurological data. Different researches have shown that cortical premotor areas contain neurons that respond to visual, somatosensory, and auditory stimuli (Gallese 2000a, 2005; Rizzolatti et al. 1997). Further, the pre-motor and parietal areas, rather than having separate and independent functions, are neurally integrated not only to control action, but also to serve the function of building an integrated representation. In particular, as underlined by Gallese (2000b) “the so-called ‘motor functions’ of the nervous system not only provide the means to control and execute action but also to represent it.” (p. 23).

This conclusion – that is very close to the claims of Common Coding Theory – is the outcome of a long series of experiments of single-neuron recordings in the premotor cortex of behaving monkeys (Rizzolatti et al. 1996, 1998). In particular, Rizzolatti and colleagues discovered that a functional cluster of premotor neurons (F5ab-AIP) contains “canonical neurons”, a class of neurons that are selectively activated by the presentation of an object as a function of its shape, size, and spatial orientation (Gallese 2000a, 2005; Rizzolatti et al. 1997). Specifically, these neurons fire during the observation of objects whose features – such as size and shape – are strictly related to the type of action that the very same neurons motorically code. Further, the canonical neurons are activated not only by observing the same object, but also by observing a group of objects that have the same characteristics, in terms of the type of interaction they allow. Two aspects of these neurons are important (Gallese and Lakoff 2005; Rizzolatti et al. 2000). On one side, what correlates with their discharge is not simply a movement (e.g. opening the mouth), but an action, that is, a movement executed to achieve a purpose (e.g. tear apart an object, bring it to the mouth). Second, a critical feature for the discharge is the purpose of the action, and not some dynamic details defining it, like force, or movement direction.

In a different cluster (F4-VIP) Rizzolatti and colleagues (Fogassi et al. 1996; Rizzolatti et al. 1997) identified a class of neurons that are selectively activated when a monkey heard or saw stimuli being moved in their peri-personal space. The same neurons discharge when the monkey turns its head toward a given location in peri-personal space. A possible explanation of this dual activation is that these neurons simulate the action (head-turning) in the presence of a possible target of action seen or heard at the same location (Gallese and Lakoff 2005). The existence of these functional clusters of neurons suggests that a constitutive part of the representation of an object is the type of interaction that is established with the object itself. In other words, different objects can be represented as a function of the same type of interaction allowed by them.

These experimental data match well with the Converged Zone Theory proposed by Damasio (1989), which has two main claims. First, when any physical entity is experienced, it activates feature detectors in the relevant sensory-motor areas. During visual processing of an apple, for example, some neurons fire for edges and planar surfaces, whereas others fire for color, configural properties, and movement. Similar patterns of activation in feature maps for other modalities represent how the entity might sound and feel, and also the actions performed on it. Second, when a pattern becomes active in a feature system, clusters of conjunctive neurons (convergence zones) in association areas capture the pattern for later cognitive use. As shown also by the data collected by Rizzolatti, a cluster of conjunctive neurons codes the pattern, with each individual neuron participating in the coding of many different patterns.

Another consequence of the link between perception and action is that observing actions or action effects produced by another individual may also activate a representation of one’s own actions. This assumption, too, has recently been confirmed from the outcome of experiments of single-neuron recordings in the premotor cortex of behaving monkeys (Rizzolatti et al. 1996, 1998). Specifically, Rizzolatti and colleagues discovered that a functional cluster of premotor neurons (F5c-PF) contains “mirror neurons”, a class of neurons that are activated both during the execution of purposeful, goal-related hand actions, and during the observation of similar actions performed by another individual (Gallese et al. 1996; Rizzolatti and Arbib 1998; Rizzolatti et al. 1996). Different brain-imaging experiments demonstrated in humans the existence of a mirror system in the premotor and parietal areas – similar to that observed in monkeys – matching action observation and execution (Buccino et al. 2001; Decety and Grèzes 1999; Iacoboni et al. 1999). Further, a recent study showed that a similar process happens with emotions: observing an emotion activates the neural representation of that emotion (Wickham 1994). In the experiment, a group of male subjects observed video clips showing the emotional facial expression of disgust. Both observing such faces, and feeling disgust, activated the same sites in the anterior insula and to a lesser extent in the anterior cingulate cortex. Finally, the results of three studies by Keyser and colleagues (2004) showed that the first-person subjective experience of being touched on one’s body activates the same neural networks in the secondary somatosensory cortices activated by observing the body of someone else being touched.

The general framework, outlined by the above results, suggests the sensory-motor integration supported by the mirror matching system instantiates neural activations utilized not only to generate and control goal-related behaviors, but also to map the goals and purposes of others’ actions (Barsalou 2003; Gallese 2004, 2005; Gallese and Lakoff 2005). This process establishes a direct link between one’s being and other beings, in that both are mapped in a neutral fashion: the observer uses her/his own resources to directly experience the world of the other by means of an unconscious process of motor resonance.

2.3 From Cognitive to Volitional: The Activity Theory Perspective

As we have suggested earlier, cognitive studies analyze how action is planned and controlled in response to environmental conditions, whereas volitional studies analyze how action is planned and controlled by subject’s needs, motives and goals. How can the two be integrated?

One of the most interesting answers to this question comes from the work of the Russian psychologists Vygotsky and Leontjev. According to these authors – usually labeled as Activity theorists – consciousness is not a set of discrete disembodied cognitive acts (decision making, classification, remembering…) and certainly it is not the brain; rather consciousness is located in everyday practice: you are what you do (Nardi 1996). Within this framework, any action is strictly related to the general and specific goals of the subject (intentionality). As underlined by Ryder (1998): “In its simplest terms, an activity is defined as the engagement of a subject toward a certain goal or objective. In nature, an activity is typically unmediated. Picking a berry from a bush and eating it is a simple, unmediated activity that involves direct action between the subject and object. In most human contexts our activities are mediated through the use of culturally established instruments, including language, artifacts, and established procedures. Picking mushrooms in the forest and eating them is an activity that is ill-advised without some form of mediation. Our subject would prudently appropriate some prior knowledge – a field guide, prior education in mycology, the direct advice of an experienced mushroom forager, or some other embodiment of human experience with mushrooms. Some means is necessary to bring the prior experience of history into the current activity. Animals have only one world, the world of direct objects and situations, mediated only through instinct. Humans have the vicarious worlds of other humans that they can invoke into the present through the use of language and artifacts” (http://carbon.cudenver.edu/~mryder/iscrat_99.html).

According to this view, any activity is undertaken by a subject (actor) – who is oriented towards a specific intention (object) – and it is always mediated by physical and social tools (artifacts). Activity Theory goes further in analyzing the action process. In particular, Leontjev (1981) distinguished, within the general activity of the subject, three different levels.

Activity is the highest level: the direct answers to a specific objective of the subject. The activity of the subject moves toward the object of a specific need and terminates when it is satisfied. Specifically, an objective is a process characterizing the activity as a whole. For example, in reference to Fig. 5.1, the activity is to obtain a Ph.D. in Psychology. Any objective is closely related to a need/motive – e.g. helping others to solve their problems – and both have to be considered in the analysis of activity.

Fig. 5.1
figure 1

The three activity levels and their link with the intentional chain

Each activity is then translated into reality through a specific or a set of actions. Each action is a process performed with conscious thought and effort, planned and directed towards achieving a goal. In reference to Fig. 5.1, the activity – obtain a Ph.D. – is translated into a set of actions: going to the library to search for the sources, preparing an index, discussing it with the tutor, etc. Each action can be then split in sub-activities, each related to a sub-goal: searching for the books about psychology of media, writing the structure of the first chapter, etc..

Actions and sub-actions are developed through operations: if actions are connected to conscious goals, operations are related to behaviors performed automatically. In reference to Fig. 5.1, the operation of typing when preparing the index of the dissertation is done automatically, without a conscious focus on the movement of the fingers. All the operations, however, are oriented by some conditions: specific constrains and affordances related to the characteristics of a given tool – such as the position of the keys on the keyboard – that influence the outcome of the operation.

The consciousness of the conditions of a given tool is what distinguishes actions and operations. When we learn how to use a new tool, its conditions are addressed with deliberate and conscious attention: they require actions. For instance, the first time one types, one has to consciously check the position of the letters on the keyboard. When the activity becomes well practiced and experienced, actions do not need to be planned but are performed without conscious thought or effort: actions become operations. The opposite process is also possible: operations become actions when the original conditions are violated. For instance, if something breaks down – pressing the key does not visualize the given letter on the screen – and/or impedes execution, the subject has to consciously address (goal) the new situation using an action.

The next step of the analysis offered by Activity Theory is related to the link between the user and the tool. Mastering a tool has two effects for the user (Kaptelinin 1996). First, the tool becomes transparent to the activity of the user: its conditions are handled automatically by the operations. Second, the tool is experienced as a property of the user: it complements or supports the user’s abilities improving the efficacy of the activity. Marsh (2003) provides the following example to clarify this point: “For example, a builder uses a saw to cut wood, a hammer fixes nails and joins wood, etc. In normal use, the saw and hammer become an extension of the builder rather than belonging to the external world. Consequently, the builder is able to focus on cutting the wood or driving the nail and not on the operations of (or reflect on) the saw and hammer in use.” (p. 88). The main limitation of the Activity Theory is in its descriptive focus. As noted by Nardi (1996): “Activity theory is a powerful and clarifying descriptive tool rather than a strongly predictive theory” (p. 6).

2.4 From Volitional to Cognitive: The Dynamic Theory of Intentions

Both the Embodied Cognition approach and Activity Theory include intentional states in their models. But what is an intention? What is it to do something intentionally? How we can read the intentions underlying the behaviour of others? If we check the literature on this topic we can find two different definitions of intention (Malle et al. 2001):

  • intention as a property of all mental states. In such a perspective any subjective, conscious experience – no matter how minimal – is an experience of something.

  • intention as an act concerning and directed at some state of affairs in the world. In this sense, individuals deliberately perform an action in order to reach a goal.

The link between these two definitions is the idea that a mental representation has been formed to accomplish a task or direct behavior to achieve some desired state in the world (Sebanz and Prinz 2006). This view corresponds to the folk psychology definition of intention: given an agent performing an action, the intention is his/her specific purpose in doing so. However, the latest cognitive studies clearly show that any action is the result of a complex intentional chain that cannot be analyzed at a single level (Pacherie 2006, 2008; Searle 1983).

The Dynamic Theory of Intentions presented by Pacherie (2006, 2008; Castelfranchi 2014) identifies three different “levels” or “forms” of intentions, characterized by different roles and contents: distal intentions (D-intentions), proximal intentions (P-intentions) and motor intentions (M-intentions):

  • D-intentions (Future-directed intentions). These high-level intentions act both as intra- and interpersonal coordinators, and as prompters of practical reasoning about means and plans: in the activity “obtaining a Ph.D. in psychology” described in Fig. 5.1, “helping anorexic girls” is a D-intention, the object that drives the activity of the subject.

  • P-intentions (Present-directed intentions). These intentions are responsible for high-level (conscious) forms of guidance and monitoring. They have to ensure that the imagined actions become current through situational control of their unfolding: in the activity described in Fig. 5.1, “preparing the dissertation” is a P-intention.

  • M-intentions (Motor intentions). These intentions are responsible for low-level (unconscious) forms of guidance and monitoring: we may not be aware of them and have only partial access to their content. Further, their contents are not propositional: in the activity described in Fig. 5.1, the motor representations required to move the pen are M-intentions.

Any intentional level has its own role: the rational (D-intentions), situational (P-Intention) and motor (M-Intention) guidance and control of action. They form an intentional cascade (Pacherie 2006, 2008) in which higher intentions generate lower intentions.

More, recent cognitive studies on our representation of external space demonstrated that tool-mediated actions modify the multisensory coding of near peripersonal space (the space within reach of any limb of an individual): the active use of a tool to physically and effectively interact with objects in the distant space appears to produce a spatial extension of the multisensory peri-hand space corresponding to the whole length of the tool (Farné et al. 2007; Gamberini et al. 2008; Riva and Mantovani 2012b). In other words, through the successful enaction of his/her intentions using the tool, the subject becomes physically present in the tool (Riva and Mantovani 2012b; Riva et al. 2014).

3 Our Theoretical Stance

3.1 From Intentions to Presence

If we compare our short description of the volitional (paragraph 2.4) and cognitive (paragraph 2.5) approaches to action and intentions, we can find some interesting similarities. Both analyze agency through a three-level chain of objects/intentions in which higher levels generate lower ones (see Fig. 5.1). Both evaluate an action as successful through the comparison of the objects/intentions driving the action with its outcome. And both consider the mastering of a tool as the way to make it transparent (directly present) to the subject. However, neither of them identifies a specific cognitive process addressing the complex task of comparing in real time and unconsciously the objects/intentions driving the action with its outcomes.

Nevertheless, recent research by Haggard and Clark (2003, 2002), on voluntary and involuntary movements, provides direct support for the existence of a specific cognitive process binding intentions with actions. In their words (Haggard et al. 2002): “Taken as a whole, these results suggest that the brain contains a specific cognitive module that binds intentional actions to their effects to construct a coherent conscious experience of our own agency.” (p. 385).

According to the view proposed in this chapter, this role is played by presence. As indicated earlier, we consider presence as a neuropsychological phenomenon, evolved from the interplay of our biological and cultural inheritance, whose goal is to produce a sense of agency and control: subjects are “present” if they are able to enact in an external world their intentions (Riva 2007, 2009). As suggested by Zahoric and Jenison (1998): “presence is tantamount to successfully supported action in the environment” (p. 87, italics in the original).

In other words, presence can be described as a sophisticated but unconscious form of monitoring of action and experience, transparent to the self but critical for its existence (Riva et al. 2008). The main experiential outcome of this process is the sense of agency: we feel that we are both the author and the owner of our own action. For this reason, the feeling of presence is not separated by the experience of the subject but it is related to the quality of our agency. It corresponds to what Heidegger (1959) defined as “the interrupted moment of our habitual standard, comfortable being-in-the-world”. In fact, a higher level of presence is experienced by the self as a better quality of action and experience: the more the subject is able to enact his/her intentions in a successful action, the more he/she is present.

Here we also argue that it is the feeling of presence that provides to the self feedback about the status of its activity: the self perceives the variations in the feeling of presence and tunes its activity accordingly.

From a computational viewpoint, the experience of Presence is achieved through a forward-inverse model (Fig. 5.2):

Fig. 5.2
figure 2

The forward-inverse model of presence

  1. 1.

    First, the agent produces the motor command for achieving a desired state given the current state of the system and the current state of the environment;

  2. 2.

    Second, an efference copy of the motor command is fed to a forward dynamic model that generates a prediction of the consequences of performing this motor command;

  3. 3.

    Third, the predicted state is compared with the actual sensory feedback. Errors derived from the difference between the desired state and the actual state can be used to update the model and improve performance.

In sum, presence provides to the agent a feedback about the status of its activity: the agent perceives the variations in presence and tunes its activity accordingly.

Why do we consciously track presence variations? Our hypothesis is that they are a sophisticated evolutionary tool used to control the quality of behaviour. Specifically, the subject tries to overcome any breakdown in its activity and searches for engaging and rewarding activities (optimal experiences). It provides both the motivation and the guiding principle for successful action. According to Csikszentmihalyi (1975, 1990), individuals preferentially engage in opportunities for action associated with a positive, complex and rewarding state of consciousness, defined by him as “optimal experience” or “flow”. There are exceptional situations in which the activity of the subject is characterized by a higher level of presence than in most others. In these situations the subject experiences a full sense of control and experiential immersion. When this experience is associated with a positive emotional state, it constitutes a flow state. An example of flow is the case where a professional athlete is playing exceptionally well (positive emotion) and achieves a state of mind where nothing else is attended to but the game (high level of presence). A corollary of the proposed vision is important for our goals: it is possible to design mediated situations that elicit a state of flow by activating a high level of presence (maximal presence) (Morganti and Riva 2004; Riva 2004; Waterworth et al. 2003).

3.2 The Layers of Presence

Even if presence is a unitary feeling, recent neuropsychological research has shown that, on the process side, it can be divided into three different layers/subprocesses (for a broader and more in-depth description see (Riva and Waterworth 2014; Riva et al. 2004)), which are described in Table 5.1, phylogenetically different, and strictly related to the evolution of Self (Damasio 1999). Here, we consider the development of Self in relation both to its intentional abilities and to the Other, whereas Waterworth, Waterworth, Riva, and Mantovani (Chap. 3, this volume) present this same basic model of presence from the perspective of the individual organism.

Table 5.1 The layers of presence

More precisely we can define “proto presence” as the process of internal/external separation related to the level of perception-action coupling (self vs. non-self). The more the organism is able to couple correctly perceptions and movements, the more it differentiates itself from the external world, thus increasing its probability of surviving. Proto presence is based on proprioception and other ways of knowing bodily orientation in the world. In a virtual world this is sometimes known as “spatial presence” and requires the tracking of body parts and appropriate updating of displays.

“Core presence” can be described as the activity of selective attention made by the self on perceptions (self vs. present external world): the more the organism is able to focus on its sensorial experience by leaving in the background the remaining neural processes, the more it is able to identify the present moment and its current tasks, increasing its probability of surviving. Core presence in media is based largely on vividness of perceptible displays. This is equivalent to “sensory presence” (e.g. in non-immersive VR) and requires good quality, preferably stereographic, graphics and other displays.

The role of “extended presence” is to verify the significance to the self of experienced events in the external world (self relative to the present external world). The more the self is present in significant experiences, the more it will be able to reach its goals, increasing the possibility of surviving. Extended presence requires intellectually and/or emotionally significant content. So, reality judgment influences the level of extended presence – a real event is more relevant than a fictitious one – and then the level of presence-as-feeling.

It is interesting to note that these three levels of presence correspond to the three levels of intentions identified by Pacherie in her Dynamic Theory of Intentions (Pacherie 2006): Motor Intentions (M-Intentions), Present Intentions (P-Intentions) and Future Intentions (F-Intentions). These three levels also correspond to the different levels of activity identified by Activity Theory: operation, action and activity. This suggests that the more complex is the level of activity, the more are all three layers of presence are required. We discuss this point further below.

3.3 From Presence to Social Presence

The previous section connected action and intentions to Presence. Recent studies suggest that a similar link exists in Social Presence, the ability of recognizing others in an external environment (Biocca et al. 2003). Specifically, it is through the recognition of the Other’s intentions that he/she becomes present to us (Riva 2006).

There is a large body of evidence suggesting that infants, even in the first months of life, show a special sensitivity to communication and participate in emotional sharing with their caregivers (Legerstee 2005). Trevarthen (2001) and Trevarthen and Aitken 2001) argues that an infant is conscious, from birth, of others’ subjectivity: he/she is conscious of other’s mental states and reacts in communicative, emotional ways so to link each other’s subjectivity. Meltzoff goes further (Meltzoff 1999; Meltzoff and Decety 2003; Meltzoff and Moore 1977; Meltzoff et al. 2002) proposing the existence of a biological mechanism allowing infants to perceive others “like them” at birth.

This ability can be defined as “Social Presence”: the non mediated (prereflexive) perception of an enacting other within an external world (Riva 2008).

How does a subject learn to recognize and explain the full intentional chain of the other? Following Csibra and Gergely (2006), this processes can be considered a predictive one: it emulates the action needed to achieve a hypothesized goal. From the computational viewpoint, it follows the same approach used by Presence (Fig. 5.3):

Fig. 5.3
figure 3

The forward-inverse model of social presence (adapted with permission from Riva 2008)

  1. 1.

    First, the agent recognizes a motor intention, and identifies the actor as another intentional self (Other);

  2. 2.

    Second, an efference copy of the motor commands (intentional chain) is fed to a forward dynamic model that generates a prediction of the consequences of performing it;

  3. 3.

    Third, the predicted state is compared with the actual sensory feedback. Errors derived from the difference between the predicted state and the actual state (break) can be used to update the model and improve performance.

Supporting this vision, Oztop et al. (2005) showed that the motor modules of the observer can be used in a “predictive mode” to infer the mental state of the actor. According to their model, mirror neurons (Rizzolatti et al. 1998, 2000) can be involved in the sensory forward prediction of goal-directed movements, which are activated both for motor prediction during action observation and for feedback-delay compensation during movement.

From an evolutive viewpoint this approach has two strengths. First, it can be seen as the brain’s attempt to minimize the free energy induced by a stimulus by encoding its most likely cause (Kilner et al. 2007). More, the recognition of others’ intentions using a forward model allows interpretation without prior experience since, as long as an intentional movement or behavior is in the repertoire of the Self, it will be interpretable without any training.

If Social Presence is the result of predicting Other’s intentions through an internal simulation, it is not separated by the experience of the subject but it is related to the quality of his/her social interactions. In fact the subject experiences reflexively the feeling of Social Presence only when the quality of his experience is modified during a social interaction: according to the level of Social Presence experienced by the subjects, they will experience intentional opacity on one side (break in Social Presence), and communicative attuning and synchrony (optimal social experiences) on the other side (Anolli et al. 2002).

3.4 The Layers of Social Presence

It is important to note, however, that social presence evolves in time and it is related to the intentional skills of the subject: a subject can recognize only the intentions that he/she is able to enact. As underlined by Meltzoff and Brooks (2001): “Evidently, infants construe human acts in goal-directed ways. But when does it start? We favor the hypothesis that it begins at birth… The hypothesis is not that neonates represent goal directedness in the same way as adults do. In fact, neonates probably begin by coding the goals of pure body acts and only later enrich the notion of goals to encompass object directed acts” (p. 188).

Specifically, the study of infants and the analysis of their ability of understanding and interacting with people suggest that also social presence, on the process side, includes three different layers/subprocesses (see Table 5.2) phylogenetically different, but mutually inclusive (Riva 2008):

Table 5.2 The layers of social presence
  • Proto Social Presence (there is an Other);

  • Interactive Social Presence (the intention of the Other is toward the Self);

  • Shared Social Presence (the Self and the Other share the same intention).

More precisely we can define “Proto Social Presence” the process allowing the identification of other intentional selves in the phenomenological world (there is an other intentional Self). In fact, newborns are able to detect intentionality (there is an Other) – they recognize that a M-intention is being enacted by another self – but they cannot detect higher level intentions – they do not recognize D-intentions and P-intentions – nor identify the motives of motor behaviors – they do not recognize why the specific M-intention is being enacted. However, this simple ability has a critical role for the newborn: the more he/she is able to identify other selves, the more the possibility of starting an interaction, thus increasing his/her probability of surviving. Proto Social Presence allows the recognition of M-Intentions only.

The next step in the development of social presence is the “Interactive Social Presence”, allowing the identification of communicative intentions in other selves (the intention of the Other is toward the Self). The more the infant is able to identify a communicative intention in other selves, the more the possibility of starting an interaction, thus increasing its probability of surviving. This skill requires the ability of enacting P-intentions and usually appears after 4–9 months from birth. Interactive Social Presence allows the recognition of M-Intentions and P Intentions only.

The highest level of Social Presence is “Shared Social Presence”, the identification of intentional congruence and attunement in other selves (the Self and the other share the same D-intention). The more the self is able to identify intentional attunement in other selves, the more the possibility of conducting an interaction, thus increasing its probability of surviving.

3.5 Intentions, Presence and Self

A key assumption of the model we just presented is a strict link between intentions, Self and Presence. Here we try to add a final claim (Riva 2008): Presence and Social Presence evolve in time, and their evolution is strictly related to the evolution of Self. Specifically, following the three-stage model of the ontogenesis of Self (Proto-Self, Core Self, Autobiographical Self) proposed by Damasio (1999), we can identify higher levels of Presence and Social Presence associated with higher levels of intentional granularity (Riva 2008).

As shown in Fig. 5.4, the higher is the complexity of the enacted and recognized intentions, the higher is the level of Presence and Social Presence experienced by the Self. In proto naked intentionality the structure of the intention includes action and goal only. When the Self experiences the highest level of Presence and Social Presence he is able to express, enact and recognize complex intentions including Subject, Action, Goal, Object, Way of Doing and Motive. In sum, the enaction and recognition of high-level intentions − D-Intentions − requires higher levels of Presence and Social Presence.

Fig. 5.4
figure 4

The evolution of self, presence and social presence (Reprinted with permission from Riva 2008)

4 Designing Optimal Presence

In our model, optimal presence in a mediated experience arises from an optimal combination of form and content, able to support the activity of the user. This picture provides us the first two guidelines for developing optimal presence in a mediated experience:

  1. 1.

    To induce optimal presence, the developer of a mediated experience has to include recognition of the specific purpose of the user. If the developer is not able to identify the specific objective of the user it will fail in supporting his/her action, reducing the level of presence.

  2. 2.

    To induce optimal presence, the developer of a mediated experience has to identify and support the specific tools that mediate the activity of the user. Most of the activity of the user is mediated by physical and social artifacts. The developer has to identify and embed in the virtual reality system features to support the action of the user effectively.

In general, we suggest that proto presence is determined only by form, core presence by both form and content, and extended presence only by content. Media form must provide the means for a convincing perceptual illusion, but the content should be integrated with (and so attract attention to) the form for the presence illusion to happen convincingly. Further, both have to support the activity of the user in reaching his/her specific objective.

We also claim that the role of the different layers is related to the complexity of the activity done in the mediated experience: the more the activity is complex, the more are the layers needed to produce a high level of presence. At the lower level – operations – proto presence is enough to induce a satisfying feeling of presence. At the higher level – activity – the media experience has to support all three levels. As suggested by Juarrero (1999), high level intentions (Future Intentions/Objects) channel future deliberation by narrowing the scope of alternatives to be subsequently considered (cognitive reparsing). In practice, once the person forms an intention, not every logical or physically possible alternative remains open, and those that do are countered differently: once I decide to do A, non-A is no longer a viable alternative and should it happen I will consider non-A as a breakdown (Bratman 1992).

What we have just seen provides two other guidelines for developing optimal presence in a mediated experience (Riva et al. 2011; Waterworth et al. 2010):

  1. 1.

    To induce optimal presence, the developer of a mediated experience has to decompose the activity of the user into its different components: the virtual reality system has to identify the start and the end of each level and sublevel of the activity of the subject to support them. Further, each level and sublevel has its specific motive. The developer has to identify all the driving motives to effectively support the activity of the person. If I want to develop a VR surgical simulator, I have to identify all the levels and sublevels of activity used by the surgeons in their standard practice and verify that the developed environment is able to effectively support them (Riva et al. 2007).

  2. 2.

    The lower is the level of activity, the easier it is to induce optimal presence: The object of an activity is wider and less targeted than the goal of an action. So, its identification and support is more difficult for the designer of a VR system. Further, the easiest level to support is the operation. In fact, its conditions are more “objective” and predictable, being related to the characteristics (constraints and affordances) of the artifact used: it is easier to automatically open a door in a virtual environment than to help the user in finding the right path for the exit. At the lower level – operations – proto presence is enough to induce a satisfying feeling of presence. At the higher level – activity – the media experience has to support all the three levels.

At the higher level of activity, optimal presence arises when the contents of extended consciousness are aligned with the other layers of the self, and attention is directed to a currently present external world (J. A. Waterworth and Waterworth 2006). However, this is a difficult task to achieve for a VR developer. He/She has to provide as much immersion as possible, integrating proto (spatial) and core (sensory) presence. To integrate extended presence, the events and entities experienced in the virtual environment must have significance for the participant. The form must provide the means for a convincing bodily and perceptual illusion, but the content should be integrated with (and so attract attention to) the form for the illusion of mediated presence to happen convincingly.

Often, an interaction designer’s aim is to design for as much presence as possible. In previous work, we have identified three ways of approaching the design of maximal mediated presence (Riva 1997; Riva and Gamberini 2000; Riva et al. 2004; Waterworth and Waterworth 2012; Waterworth et al. 2010): digital participation, mediated flow, and embodied immersion. In these situations, the organism responds as if what happens in a mediated environment is real, in the fullest sense, and of immediate significance. Digital participation can arise if we design a role for the participant as a performer in an interactive drama (Nath 2001) seen from a first person perspective. If the performer becomes emotionally and intellectually engaged by the events in an appropriately immersive environment, extremely high levels of presence can be achieved (Waterworth et al. 2002). A feature of this state of participation is a corresponding loss of self-consciousness. Not that the self is not present – it is maximally so – but an internal model of the self is not the focus of extended consciousness. In this respect, digital participation resembles the flow state. According to Trevino and Webster (1992) mediated flow corresponds to the extent to which (a) the user perceives a sense of control over the interaction, (b) the user perceives that his or her attention is focused on the interaction, (c) the user’s curiosity is aroused during the interaction, and (d) the user finds the interaction intrinsically interesting. As with digital participation, events are experienced from a first person perspective.

Finally, embodied immersion, is the outcome of second-order mediated actions (Riva and Mantovani 2012b): the subject use the body to control a proximal tool that controls a different distal one (a tool present and visible in the extrapersonal space, either real or virtual) to exert an action upon an external object. An example of second-order mediated action is the one of the videogame player using a joystick (proximal tool) to move an avatar (distal tool in a virtual space) to pick up a sword (external virtual object). A possible, simpler variant of second-order mediated action is the direct use of the body to control a distal tool that exerts an action upon an external object. An example of this variant is the interaction with the Microsoft Kinect system: I move my body to move an avatar (distal tool) to pick up virtual objects. This specific mediated action produces two different effects on our spatial experience (Riva and Mantovani 2012a; Riva et al. 2014; Slater et al. 2009, 2010):

  • a successfully learned second-order mediated action produces incarnation: a second peripersonal space centered on the distal tool (the subject is present in the extrapersonal space – telepresence);

  • a successfully learned second-order mediated action associated to a spatio-temporal correspondence between multisensory feedbacks experienced by the user and the visual data related to the distal virtual body (avatar) produces embodiment: the user experiences a new body in the avatar (the subject is present in a different body – body ownership illusion).

5 Conclusions

There is a consensus that the experience of presence is a complex, multidimensional perception formed through an interplay of raw (multi-) sensory data and various cognitive processes (IJsselsteijn and Riva 2003). Starting from this broad statement, in this chapter we attempted to provide an elaborate – and probably controversial – account of the fundamental presence enabling mechanisms based on the interaction between intentions and actions.

Recent research in neuroscience has tried to understand human action from two different but converging perspectives: the cognitive and the volitional. On one side, cognitive studies analyze how action is planned and controlled in response to environmental conditions. On the other side, volitional studies analyze how action is planned and controlled by subject’s needs, motives and goals. Here we suggested that presence is the missing link between these two approaches.

Specifically, we described presence as a neuropsychological phenomenon, evolved from the interplay of our biological and cultural inheritance, whose goal is the enaction (to transform in actions) of the volition (intentions) of the Self: subjects are “present” if they are able to enact their intentions in an external world.

The link between intention and action is also the key to recognizing and distinguishing between Self and Other. Through presence, the Self prereflexively controls his/her action through a forward-inverse model: the prediction of the action is compared with perceptual inputs to verify its enaction. Through Social Presence − the non mediated perception of an enacting Other within an external world – the agent prereflexively recognizes and evaluates the action of Others using the same forward-inverse model: the prediction of the action is compared with perceptual inputs to verify its enaction.

We have described social presence as a defining feature of self, allowing the detection of the content and motives of others’ intentions. Without the emergence of the sense of social presence it is impossible for the self to develop a theory of mind allowing the comprehension, explanation, and prediction of behavior and, in general, the management of the social interactions.

Both Presence and Social Presence evolve in time, and their evolution is strictly related to the evolution of Self. Through an evolutionary process allowed by the interaction between presence and social presence, the sensory-motor information embedded in Motor Intentions is transformed in the perceptual and indexical content of proximal intentions and finally in the descriptive, conceptual content of distal intentions, as suggested by Pacherie in her Dynamic Theory of Intentions (Pacherie 2006). Following Damasio’s three-level model of Self (Proto-Self, Core Self, Autobiographical Self) we can identify higher levels of Presence and Social Presence associated with higher levels of intentional granularity.

The above vision applies also to mediated action. When we experience strong mediated presence, our experience is that the technology has become part of the self, and the mediated reality to which we are attending has become an integrated part of the other. When this happens, there is no additional conscious effort of access to information, nor effort of action to overt responses in the mediated environment. We perceive and act directly, as if unmediated: we do not need any effort to check if were able to transform our intentions in actions. The extent to which we experience presence through a medium thus provides a measure of the extent to which that technology has become an integrated part of the self. Maximal presence in a mediated experience arises from an optimal combination of form and content, able to support the intentions of the user.

In conclusion, we believe that our model makes sense in terms of evolutionary psychology and is beginning to be supported by evidence of the neural and other physical correlates of action, imitation and self-monitoring. It also provides testable predictions about how to improve the experience of presence in interactive media.