Keywords

One of the most important tasks that the visual system performs is to construct an accurate representation of the spatial layout of the environment. Accurate representations are necessary for successful action (like navigation) as well as for planning future actions. Traditional descriptions of how these spatial representations are formed have focused on how we construct the three-dimensionality of the environment, such as the distance to objects and their sizes. Our visual system has evolved to construct representations by capitalizing on visual cues in the environment as well as physiological cues inherent to the eye. To be sure, these cues are often sufficient to accurately represent the geometry of the environment. However, recent research suggests that information specified by the body, whether physical information for body size or emotional information about the state of the body, can also contribute to perceptions of the spatial layout of the environment (Proffitt, 2006; Proffitt & Linkenauger, 2013; Stefanucci et al., 2011).

In essence, this argument—variously termed the embodied perception approach (Barsalou, 2008; Glenberg et al., 2013; Proffitt, 2006) or cognitive penetrability (Stokes, 2013)—is that top-down processes, including assumptions and prior knowledge about the environment intrinsic to the observer, can be used to form a more coherent and accurate perceptual representation. Prior work suggests a role for top-down processes (aside from the state of the body) in perception. When objects are moving but the motion appears ambiguous, memories of those objects may influence interpretations of the direction of motion (McBeath et al., 1992). Further, recognition of an object can aid in perceiving its depth (Peterson, 1994). Recent research suggests that action and intent to act can influence how observers perceive the space in which they are acting (Witt, 2011, 2017). For example, Witt et al. (2005) found that the use of a tool to increase participants’ reachability influenced their perception of how far objects just out of reach were from them. Finally, top-down processes such as emotion and motivation have been shown to influence the perception of faces (Halberstadt et al., 2009), as well as contrast sensitivity, which is thought to operate through an interaction between the amygdala, visual cortex, and other regions that direct attention (Phelps et al., 2006). Our own work suggests that emotion also plays a role in perceiving heights (Stefanucci & Proffitt, 2009).

To be fair, embodied approaches are not the first to postulate a role for the body in perception. Gibson’s ecological approach (Gibson, 1979) claimed that perception cannot be achieved without taking into account the body of the observer. In other words, perception is a synergistic activity; perceiving what the environment affords in terms of action is only possible if observers perceive the environment in terms of the capabilities of their bodies (see also, Warren, 1984). For example, a ball is perceived as graspable only if it is small enough to fit within the observer’s hand. Likewise, an aperture is only passable if it is wider than one’s body (Warren & Whang, 1987). Thus, the body is an integral piece in the solution to the perceptual problem. A more detailed discussion of work supporting the ecological approach will ensue in the following sections.

We begin this chapter by reviewing the literature that provides evidence for the role of the body (its physical characteristics and its emotional states) in perceiving and acting in space. We then discuss how (1) visually manipulating the size of the body in virtual reality (through the use of visual avatars or graphical representations of the body) and (2) manipulating the emotional state in virtual reality may allow us to better understand and test the body’s role in perception and action. Virtual environments (VEs) are a unique tool for addressing theoretical questions such as whether physical body size and emotional body state influence spatial perceptions and judgments about action because they allow for manipulations of visual body size and emotional state that are often impossible or cumbersome in real environments (e.g., changing the visual size of the body or taking participants to a tall height). We conclude with a discussion of how body-based perception may be useful for real-world applications, and how critiques of body-based approaches to perception and action may help refine testing of theoretical questions in the future.

Measures of Perception and Action

The goal of the visual system is to recover from the 2D proximal stimuli (the light present on the retina) the three dimensions of the distal stimuli (or the geometric properties of the natural world). Several visual cues in both the environment and the physiology of the observer allow for accurate perceptions of space and other stimuli without taking into account body-based information. For example, the thickness of the lens of the eye (as controlled by the ciliary muscle) can provide information about the depth of an object viewed within arm’s reach. Pictorial depth cues such as linear perspective can provide information about the distance to objects. And shading can provide information about the curvature of a surface in the environment. However, neither physiological nor pictorial depth cues can account for all perceptions of geometric space. Here, we review terminology and aspects of studying the perception of spatial layout relevant for understanding how body-based information may play a role, as covered in the sections that follow.

The extent to which visual and physiological cues provide information regarding the layout of space is often defined by the measure used to assess perception. Spatial information can be assessed in either an absolute or relative manner. We assess absolute spatial information by asking observers to report what they see in some type of fixed unit or standard. These units or standards may be culturally defined (e.g., feet or meters), or they may be defined relative to the observer’s body (e.g., eye height or arm length). In contrast to absolute assessment, we can assess relative information about spatial layout by asking observers to compare geometric properties within the environment. It is important to note that this relative comparison does not necessitate being able to report on absolute information about either property. An example of a measure that assesses relative depth information is visual matching, through which observers are asked to adjust one depth interval to be equivalent to another. Such tasks can also be used to assess other properties of space, such as object size (Kenyon et al., 2007).

We also need to consider the frame of reference in which spatial information is observed. In the case of judging distances, space can be viewed either with the observer’s point of view as a reference point from which distances are judged (termed an egocentric frame of reference) or with two external objects (neither being the observer) as reference points for judging an extent (termed an exocentric frame of reference). Prior research has focused mostly on egocentric absolute distance perception. In addition, distances judged can be categorized according to where they are in space relative to the observer, as proposed by Cutting and Vishton (1995). Specifically, they defined three regions of space: personal, action, and vista. Personal space is the area immediately around us (roughly the space within arm’s reach). Action space is the area from personal space up to about 30 m; here we can easily and immediately interact with objects. Vista space encompasses all distances beyond action space. The accuracy with which traditional visual cues can be used to judge spatial layout in each of the areas of space varies, with the number of cues specifying distance decreasing as space from the observer increases. For example, eye height can be used as a cue for distance perception in action space but not personal or vista space. Binocular stereo is a cue for absolute distance in personal space but not action or vista space. Very few (if any) useful cues provide information about far vista space (or vast spaces, see Klatzky et al., 2017).

One of the most challenging aspects of studying space perception is accurately assessing what people truly “see.” As soon as people are asked to report what they see in some way, cognition can intervene and potentially bias responses (Loomis & Knapp, 2003). Although space itself can be measured neatly in increments (such as feet or meters), observers asked to report distance in those absolute units must rely on a stored metric to translate what they see into these units. For example, we can measure perception of space by asking people to verbally estimate how far away they perceive objects in the environment to be (in terms of some metric like centimeters or inches, which are absolute measures). However, an inaccurately stored representation of the unit can obviously result in biased reports. Further, different representations for units across observers can unnecessarily increase variability in responses. Cultural differences among observers, for example in which units they know and use, can also induce variability.

An alternative approach to measuring perception is to ask observers to perform an action rather than provide an estimate with an arbitrary unit. The assumption behind these types of action-based measures is that visual representations are needed in order to precisely perform the action. In other words, the resulting action can provide insight into the accuracy of the underlying visual representation on which it was based. An example of such a measure is blind walking. Observers are asked to view a target and then to walk without vision to where they believed the target is located in space. Research suggests that observers are quite accurate at performing blind walking tasks up to around 20 m (Loomis et al., 1992; Rieser et al., 1990). Another example of an action-based measure is affordance judgments, in which observers report (from a static viewpoint) whether or not they can act in certain ways on the environment or objects within it (Warren, 1984). In general, work on judging affordances finds that observers can reliably judge whether they can perform actions given an environmental feature. Overall, the appeal of these action-based measures of perception is that they are often easier to understand, less prone to cognitive or response biases, and easily implemented (at least in the case of affordance judgments) in virtual environments where large-scale actions may be more difficult to execute given limited tracking space. We will discuss this advantage for action-based measures in VEs in later sections. The next section introduces how the body may play a role in spatial perception.

Embodiment in the Real World

Physical Body

The notion that the size of the body plays a role in the perception of the environment is not unique to recent embodied cognition approaches. Gibson (1979) posited nearly a half-century ago that the perception of what he termed affordances was only achieved through perceiving the body relative to the aspect of the environment being perceived. In other words, an object only affords grasping if one’s hand is large enough to hold the object. Numerous bodily dimensions and their relationship to the perception of whether or not action is afforded by an environment have been investigated since Gibson’s original claim. Bodily dimensions include eye height (Leyrer et al., 2011; Warren & Whang, 1987; Wraga, 1999), hand size (Linkenauger et al., 2011), leg and arm length (Creem-Regehr et al., 2014; Mark & Vogele, 1987), and other bodily properties. These body dimensions have been examined for their role in judging affordances for sitting, stepping, passing through, reaching, grasping, and many more. Perception of affordances has also been investigated across the lifespan (Adolph et al., 1993; Hackney & Cinelli, 2013) and in other species (Wagman et al., 2017), with results suggesting accurate perceptions of action capability (albeit with some prior experience with the action needed).

Embodied perception approaches also posit that the body plays a role in perceiving the surrounding environment. However, the mechanisms by which the body plays a role are somewhat different than those proposed by Gibson’s approach of affordances. For instance, Proffitt and Linkenauger (2013) claim that the body is useful information for perceiving the world around us because it provides a constant “ruler” with which to scale the world. Which units of the body are called upon and used for scaling depends on the intended action. If an observer is perceiving whether something is reachable, then the scaling unit becomes the length of the arm; however, if the observer is perceiving graspability, then the scaling unit is the size of one’s hand. To be clear, embodied approaches do not claim that the body is always used to scale the world, but rather that it is a reliable unit that when manipulated has an effect on perceptions of spatial layout and action. Further, over evolutionary history, the body was a unit that was always present and could provide consistent information to scale spatial layout when other cues were lacking (Stefanucci et al., 2011). However, testing theories of embodied perception depends on the effectiveness of manipulating the body (either its physical size or its emotional state) to discern the effect of those manipulations and resulting bodily characteristics on perceptual scaling of spatial layout.

Altering physical body dimensions can be difficult to accomplish in real-world settings, but there is some evidence from unique manipulations to suggest that visual body size is used to scale the environment. For example, Linkenauger et al. (2010) manipulated the visual size of objects using goggles designed to magnify or minify object size. They asked participants to report the apparent size of familiar sized objects (e.g., baseball) and unfamiliar sized objects (e.g., wooden cylinder) and found that participants’ reported object size was less affected by the minification or magnification goggles when they were allowed to place one of their hands on the table (the hand was not magnified or minified by the goggles, but could be seen by participants), suggesting that they used hand size to scale objects in the environment. In contrast, Linkenauger et al. (2010) ran subsequent experiments in which a participant had a tool of familiar size or a researcher’s hand present on the table instead of his or her own hand. These reference objects did not change perceptions, suggesting that one’s own body may play a unique role in scaling the environment (but see, Collier & Lawson, 2017, for difficulty replicating some of these effects). Follow-up experiments by another research group showed that enlarging an observer’s visual hand size while performing a reach-to-grasp task affected the maximum grip aperture of the hand, but not other kinematics of the reaching movement (Bernardi et al., 2013). However, making the visual hand size smaller did not re-scale the grip aperture.

We can also manipulate the body by adding objects to it that change action capabilities and/or alter its physical size or shape. For example, participants asked to judge whether they can pass through an aperture while holding a wide rod may underestimate the size of the aperture more than those not holding a rod (Stefanucci & Geuss, 2009). Raising observers’ eye heights by asking them to stand on blocks resulted in greater accuracy in judging the height of a barrier they must walk under compared to another group asked to wear a helmet while making judgments (Stefanucci & Geuss, 2010). Interestingly, experience with the helmet (e.g., by wearing it often as an ROTC member) eliminated the bias in judgments of whether the barrier could be walked under without ducking. Wraga (1999) manipulated participants’ eye heights (by employing a false floor not visible to participants) and asked them to judge the height of stairs and the width of apertures as well as whether they could step on the stairs and pass through the apertures. The effective eye height manipulation significantly affected participants’ perception of environmental dimensions as well as their action capabilities (they underestimated their abilities when eye height was raised), even though they were unaware of it (as assessed through self-report at the end of the experiment). Altering participants’ jumping capabilities by having them wear ankle weights affected perception of the width of jump-able gaps, but not gaps too wide to be jumped across (Lessard et al., 2009). This finding is important in that it shows that manipulations to the body only affect actions that are possible for observers rather than any action. Witt et al. (2005) investigated the influence of holding a rod on the perceived distance to targets. The targets were placed outside of participants’ reach without the rod but within reach of those holding the rod (rod holding was a between-groups manipulation). Participants judged targets to be closer when holding the rod compared to not, suggesting that physical body capabilities (even if augmented with a tool) can scale distance perception. This work was extended to biases in farther distance perception when observers’ abilities to point to a target were augmented with a laser pointer (Davoli et al., 2012).

Taken together, the literature suggests that our physical bodies and their capabilities play a role in how we interpret and interact with the world. This conclusion has important implications for individuals who experience changes to their physical bodies, such as rapid growth in adolescence or the loss of action capabilities due to disease or injury. Natural changes in body dimensions also occur during pregnancy. Measuring real actions by asking pregnant women to attempt to walk through apertures showed that pregnant women did adapt to their changing body shape over time and accordingly adjusted their judgments of what they could pass through (Franchak & Adolph, 2014). With regard to clinical disorders, individuals with Anorexia Nervosa (AN) tend to overestimate the size of their body with a variety of measures (Gardner & Brown, 2014), and this misperception influences how they perceive their ability to act within an environment (Keizer et al., 2013). Body dysmorphic disorder (BDD) can also be accompanied by perceptual biases (Clerkin & Teachman, 2008), such that individuals high in BDD judge morphed versions of their faces that are rated less positive to be accurate representations of their actual face.

Similarly, pain influences body perception in patients who experience phantom limb pain, those with complex regional pain syndrome (CRPS), and those with other chronic pain (Lotze & Moseley, 2007). CRPS patients often misperceive the size of their affected limbs (Lewis et al., 2007), and about 80% of amputees experience phantom limb pain, which is the sensation of pain in a limb that is no longer part of their body leading to a profound body misperception related to pain (Ephraim et al., 2005). Consequently, pain may serve as a bodily state that affects the perception of the environment and perceived ability to act within that environment. Chronic pain patients have been shown to judge distances as farther compared to healthy controls (Witt et al., 2009), and healthy participants with experimentally induced leg pain via a hypertonic saline injection subsequently underestimate affordances due to this pain manipulation (Deschamps et al., 2014). Still, with some mixed evidence on how chronic pain influences perception (Tabor et al., 2016), more research is needed to disentangle how pain might interact with perception (or vice versa). Additionally, the current research does not provide evidence about a potential mechanism by which pain may affect perception. Further, pain may indirectly affect perception by interrupting attention or another process that may affect how people respond to the perceptual tasks used in previous studies.

The Emotional and Motivated Body

Physical (and perceived) size are not the only bodily states that may affect perception and action in the real world. Work by our research group and others (see also the affect-as-information approach, Clore & Huntsinger, 2009; Storbeck & Clore, 2008; Zadra & Clore, 2011) suggests that emotional states of the body may serve as information with which to scale the environment, particularly when other visual cues for space perception are at a minimum (Stefanucci & Proffitt, 2009; Stefanucci et al., 2011; Teachman et al., 2008). Further, emotional effects on perception may be functional in that emotion can make important objects in the environment more salient, can motivate action, and may also conserve bioenergetic resources (Stefanucci et al., 2011; Zadra & Clore, 2011). Finally, emotion affecting perception (e.g., in the case of fear) may have had evolutionary consequences (Stefanucci et al., 2011). Consider perceiving heights (or vertical distances)—many of the most useful cues for understanding distance (linear perspective, horizon ratio) rely on the assumption that the observer is standing on the ground plane (Gibson, 1950), but in the case of heights the observer is off the ground. Thus, participants overestimate heights viewed from above looking down by nearly 60% (and by about 30% when viewed from below), and this overestimation correlates with self-reports of trait- and state-level fear (Stefanucci & Proffitt, 2009). In a more direct test of the effects of emotion on perceiving heights, viewing arousing images (which subjectively increased arousal ratings of participants) before estimating a height from above increases overestimation of height (Stefanucci & Storbeck, 2009), but does not affect the perception of distances in a non-threatening hallway (i.e., an extent for which information about arousal is irrelevant). But, if self-reported arousal is increased by asking one group of participants to judge the distance to or across a threatening situation (such as a pit of nails and broken glass), then perception of the extent (as measured with a visual matching task) is overestimated compared to a group judging the same extents in a non-threatening situation (Stefanucci et al., 2012). In all of these cases, emotion motivates the observer to act in a certain (non-dangerous) way by altering perceptions of the environment to ensure safety.

This effect of emotion on perception generalizes to other environmental situations. For instance, arousal affects judgments of the size of a beam that observers anticipate walking across. Aroused participants deemed the beam to be narrower (as assessed with a visual matching task) than non-aroused participants (Geuss et al., 2010b). Participants made anxious by breathing through a narrow straw underestimated their ability to reach to, reach through, and grasp objects compared to a different group of participants who breathed through a wider straw (Graydon et al., 2012). People asked to think about a friend while standing at the base of a hill (assumed to evoke positive feelings) verbally estimated the hill to be less steep (and visually matched it to be less steep) than those asked to think about an enemy (Schnall et al., 2008). Participants who reported feeling sad after listening to melancholy music estimated hills to be steeper than participants who listened to happy music (Riener et al., 2011). Participants who experienced fear due to standing on a skateboard at the top of a hill and thinking about going down it estimated the hill to be steeper than those not afraid (Stefanucci et al., 2008). Participants higher in trait fear of heights overestimated heights more than those who are low in trait fear, even when controlling for cognitive biases associated with the high-trait fear (Teachman et al., 2008). Emotional content also affects size perception; circles filled with negative stimuli were judged to be larger than circles containing positive stimuli (van Ulzen et al., 2008). Finally, even basic visual processes such as contrast sensitivity can be enhanced by fear (Phelps et al., 2006). Manipulation checks are always important in emotion research, and all of the work mentioned here acquired subjective reports of emotion (mostly at the end of the experiments) to validate that participants were feeling the intended emotion during the study if emotion was manipulated.

In addition to emotion, motivational states have also been shown to influence the perception of space. For example, thirst affects the perception of transparency; dehydrated observers often report seeing surfaces as more transparent, an important property of water (Changizi & Hall, 2001). Additionally, participants estimate desirable objects (such as a water bottle when they are thirsty) to be closer (Balcetis & Dunning, 2010) and the distance to desired locations as shorter than undesirable locations (Alter & Balcetis, 2011). Further, participants who are fluid deprived (as assessed through self-report) estimate a glass of water to be larger when primed to think about drinking, compared to a group that is not primed to think about drinking (Veltkamp et al., 2008). Motivation to approach or avoid an object in the environment may also influence perception. Threatening objects appear closer, while disgusting or neutral stimuli appear farther away (Cole et al., 2013). However, most of the work reporting an effect of motivation on perception controlled for arousal, which suggests that motivation and arousal may affect perception through different underlying mechanisms or systems (Balcetis & Dunning, 2007). For a more complete review of approach and avoidance as aspects of motivated perception, see Balcetis (2016).

Embodiment in Immersive Virtual Environments

The use of immersive virtual reality to support training, learning, simulations, and other applications increases daily with the advent of new, cost-effective, and commodity-level head-mounted displays (HMDs). Effective use of these technologies for these applications relies on an understanding of whether people experience and learn from virtual worlds in the same ways that they do in real environments. Specifically, do people see and act in virtual environments as they do in the real world? These questions are especially important to answer in applications for which accurate spatial representations are needed (surgery, architecture, etc.). Another important recent change to these technologies is the easier implementation of self-avatars, that is, graphical representations of the observer in a virtual environment. Increasing the use and implementation of self-avatars allows for manipulations of visual body size not easily accomplished in real environments. We argue in the remainder of this chapter that virtual environments provide researchers with a unique tool for testing embodied theories, and we present recent evidence from our laboratory and others to support this claim.

We define virtual reality (VR) as the “use of computer graphics to perceptually surround an observer so that he or she has the experience [of] being in a simulated space” (Creem-Regehr et al., 2015b, p. 196). Available VR systems vary in tracking abilities (e.g., eye-tracking, hand and foot tracking, full-body tracking), tracking range (e.g., whether an observer can walk in a large space), and immersion, the extent to which the virtual environment completely removes the observer from the real environment. Some systems, such as the HTC Vive and Oculus Rift, use an HMD to completely immerse participants in a virtual world, while other systems use a collection of screens to surround users with a simulated environment (e.g., a CAVE, Cruz-Neira et al., 1993). Augmented reality (AR) is an additional technology that simply displays virtual objects in the context of the real environment (see Fig. 14.1). An example of AR is the Microsoft HoloLens, but readers may also be familiar with AR phone apps, such as Pokemon Go. With current technologies, one major difference between VR and AR is the field of view (FOV) in which virtual objects are rendered. Human vision provides a field of view that is approximately 180 degrees horizontal. Whereas in recent years VR has achieved a large FOV with technological advances (e.g., most of the new technologies have a FOV of up to 110°), AR technologies have struggled to provide wide viewing areas. The HoloLens, for example, can utilize only a roughly 40-degree FOV. Because of the differences in immersion and viewing capabilities across technologies, researchers in the field have broadly named the whole category of technologies mixed reality or XR. For the rest of the chapter, we will focus primarily on immersive virtual reality technologies, but, when appropriate, we will mention if other types of mixed realities were used to test embodiment.

Fig. 14.1
figure 1

Virtual pits of varying depth, a shallow, b medium, and c deep as displayed in augmented reality using the Microsoft HoloLens 1. Reprinted from Wu et al. (2019)

Perceptual Fidelity

VR is a particularly useful tool for perception researchers because it often exhibits greater ecological validity than other laboratory paradigms, and it allows for the use of many action-based measures (especially with large tracking areas). As stated earlier, if the body is represented in the virtual environment, then VR allows for manipulations of body size and motion that can be used to test embodied theories. An important matter to address before discussing how manipulations of bodily size and state can be achieved in VR is whether or not observers perceive immersive virtual environments (IVEs) in the same manner as they do real environments. Our prior work describes this question as being one of perceptual fidelity, which is the extent to which perception and behavior in an IVE is similar to that in the real world (see also Creem-Regehr et al., 2015a; Stefanucci et al., 2015, for more discussion of this topic). The perceptual fidelity of IVEs is extremely important to assess given that we often want to generalize findings from experiments using virtual environments to the real world. If observers perceive virtual environments in very different ways than real environments, generalization becomes much more difficult.

Perceptual fidelity can be measured by asking observers to perform absolute and relative perceptual tasks (such as verbal reports, affordance judgments, and visual matching) in the real world and then comparing real-world performance to that in immersive virtual environments. Past research has generally found compression of scale in IVEs compared to in the real world, with estimates in IVEs averaging between 40 and 80% of the actual value (Creem-Regehr et al., 2015a; Renner et al., 2013). For example, Sahm et al. (2005) compared real-world and IVE performance on a blind walking (walking without vision to a previously viewed target) and a blind throwing (throwing without vision to a previously viewed target) task and found that, in the IVE, participants walked and threw to distances 70% of what they did in the real world, suggesting they perceived the distances as shorter in the IVE compared to the real world. Mohler et al. (2006) compared real-world and IVE performance for both blind walking and verbal reports and obtained similar results. As Creem-Regehr et al. (2015a) as well as Renner et al. (2013) reviewed, until recently, studies using HMDs consistently found this underestimation in IVEs. With the advent of the newer commodity-level HMDs, this underestimation has been reduced, suggesting that wider fields of view and less tracking latency may have contributed to the originally observed underestimations, although these factors have not been tested directly (Buck et al., 2018; Creem-Regehr et al., 2015b).

In contrast to the findings for distance perception, affordance judgments are generally similar across IVEs and the real world. For example, Geuss et al. (2010a) asked participants in the real world or a closely matched virtual environment to view an aperture displayed by a gap between two poles and then to (1) blind walk to the location of the poles, (2) provide a size-matched estimate of the size of the gap between the poles, or (3) predict whether they could pass between the poles. For the VR condition, the poles were virtual. The poles varied in distance from participants (3, 4.5, or 6 m) and in terms of the size of the gap between them (25–50 cm) across trials (see Fig. 14.2). Geuss et al. (2010a) used three different perceptual measures to determine whether affordance judgments were affected by the distance compression seen with both blind walking and verbal report tasks in IVEs previously. For blind walking, participants underestimated the distance to the poles, as observed in previous studies. However, affordance judgments and size-matched estimates were similar in the real and virtual worlds, indicating that affordance and size judgments may not be susceptible to the same perceptual biases found for distance perception.

Fig. 14.2
figure 2

View of the poles used in Geuss et al. (2010a) and Pointon et al. (2018) as displayed in a real world, b VR, and c AR

Pointon et al. (2018) tested whether Guess et al.’s (2010a) findings replicated in augmented reality. They presented participants with two vertical poles, the distance of which from the participant varied (3, 4.5, or 6 m), as did the width between the poles (30–60 cm). In the first block of trials, participants viewed the poles and provided a “yes/no” judgment of whether they could pass between them. In the second block, participants viewed the poles and then blind walked the distance to them. Pointon et al. (2018) compared performance in AR to the IVE and real-world results from Guess et al. (2010b) to determine whether action-based measures could be used to determine perceptual fidelity in AR. Comparing performance across environments showed no significant differences between passability judgments in the real world, IVE, or AR. Notably, while Geuss et al. (2010a) found significant differences between the blind walking performance in the real world and in immersive virtual environments, Pointon et al. (2018) did not find a significant difference in blind walking performance between augmented environments and the real world. Taken together, these studies demonstrate the importance of testing the perceptual fidelity of virtual technology as well as the utility of action-based measures for conducting perceptual research in both virtual and augmented reality.

Within immersive virtual environments, the body is represented using a self-avatar, or a graphical representation of the user. Self-avatars can be human-like in appearance, or they can be stylized to be different than the human body. They can also represent just one body part, such as the feet or hands, instead of the entire body. Self-avatars may increase the perceptual fidelity of virtual environments. For instance, observers give more accurate egocentric judgments of distance within a virtual environment when a full-body, self-avatar is present (Mohler et al., 2008, but also see McManus et al., 2011); the improvement in accuracy of distance perception is even greater if the self-avatar is animated to follow the participant’s real movements (Mohler et al., 2010). The perceived size of virtual objects is influenced by the presence and size of a virtual hand (Linkenauger et al., 2013). With regard to affordance judgments, the presence of a self-avatar improves estimates of stepping over or ducking under a pole (Lin et al., 2012) and stepping off a ledge (Lin et al., 2013, 2015). These findings provide support for the notion of the body as a “perceptual ruler” with which we perceive and scale the environment to inform action.

Manipulations of Physical Body Size in Virtual Environments

As previously stated, the body is an important source of information for perception. From research done in the real world, we know that manipulating the perceived size of one’s body can have consequences for perception and, in turn, behavior. In IVEs, people will embody a virtual avatar and localize themselves toward the presented location of that avatar (Lenggenhager et al., 2007). Prior work also suggests that observers readily accept self-avatars as their own even when the self-avatar body is grossly different in size from one’s own body (Piryankova et al., 2014b). For example, adults can adopt a child-sized body as theirs in virtual reality (Banakou et al., 2013). Whereas modifying the perceived size of a body can be difficult in real environments, self-avatars are easily manipulated in virtual environments. Subtle manipulations of body width and shape have been used to gauge how effectively women recognize their own body size, with results suggesting that women accept a margin of error of up to 6% change in body mass index (BMI) as still their own perceived size (Piryankova et al., 2014a). However, one’s own body size can influence the sign of the error such that women with a higher BMI are more likely to allow for positive margins of error than women with a lower BMI (Thaler et al., 2018).

Further, when the size of a self-avatar is altered, people may perceive the environment differently and scale their abilities to act accordingly in that environment. For example, participants who embody a child-sized avatar will overestimate the size of objects in their environment with a visual matching task compared to participants who embody an adult avatar (Banakou et al., 2013). Similarly, participants who virtually embodied a giant judged objects and distances to be smaller and shorter than participants who virtually embodied a doll (Van Der Hoort et al., 2011). Participants when shown big virtual feet, and asked what they could step over in a yes/no affordance judgment task, estimated that they could step over larger gaps than those shown small virtual feet (Jun et al., 2015). Similarly, when participants are presented with larger virtual hands (Linkenauger et al., 2013) or longer virtual arms (Day et al., 2019), they estimate that they can grasp larger objects and reach objects that are farther away, respectively. However, the effect of increased arm length on reaching estimates depends on having some experience with moving the arm (Linkenauger et al., 2015).

As previously mentioned, adult participants will adopt a child-sized virtual body (Banakou et al., 2013), and then may use it to scale the environment, resulting in an overestimation of the size of objects (Tajadura-Jiménez et al., 2017). This finding was replicated in an older adult population, as well, but the embodiment effect only occurred when auditory signals were congruent with the presented body (i.e., a child’s voice accompanying a child-sized avatar vs. one’s own adult voice accompanying the child-sized avatar; Tajadura-Jiménez et al., 2017). In addition to scaling their bodies to the dimensions of an avatar, people may also adopt the actions of an avatar as their own. When visual information about movement is mismatched with actual movement, participants are poor at discriminating their own movement from that of an avatar, suggesting that they adopt the movement of the avatar as their own (Debarba et al., 2018). Further, when participants are given feedback about virtual reaching, regardless of the anthropomorphic fidelity of the avatar presented, judgments of what is reachable in the virtual environment become more accurate (Ebrahimi et al., 2018).

Evoking Emotion with Virtual Reality

As previously stated, emotion is an important aspect of embodied perception approaches, and a great perk of using VR as a research tool is that we can easily create environments and paradigms that evoke emotion, which allows us to test emotion’s effect on perception and affordances. One of the first studies to demonstrate this capability showed that a virtual pit could elicit a strong sense of presence (the belief that one is truly in the virtual environment as if it were real) in an immersive virtual environment as assessed with a questionnaire (Usoh et al., 1999). Follow-up studies that directly assessed emotion through the use of physiological measures (e.g., heart rate variability) also showed that IVEs can reliably alter emotion by increasing the perception of risk (Meehan et al., 2002, 2005). More recent work has shown that IVEs can be used to evoke a range of emotions including joy, sadness, boredom, anger, and anxiety (Felnhofer et al., 2015). This work exposed participants to virtual parks that differed in the mood that they intended to induce (i.e., for anxiety, the park was presented with less illumination so objects were harder to see) and showed that each of the parks reliably induced the intended emotion through self-reports from participants. More recently, brain activations and heart rate have been shown to vary in virtual environments intending to evoke emotions that differed in the level of arousal and valence (Marin-Morales et al., 2018). Further, variance in the brain and heart signals was able to be classified in a model in order to predict the emotional experience of the participants.

Given that IVEs are useful tools for inducing emotion, they allow for testing effects of embodied states on the perception of space. For example, Geuss et al. (2016) created a virtual environment in which participants were asked whether they could step over gaps of different widths and visually match the extent of the gap. Participants made these estimates on a platform raised 15 m in the air and on the ground (see Fig. 14.3). Conducting a similar experiment in the real world would be challenging for many reasons (i.e., difficulty in varying the height and increased danger for participants), which further demonstrates the usefulness of IVEs for inducing emotion. Geuss et al. (2016) found that observers reported higher ratings of subjective distress when the platform was farther off the ground, and this manipulation of fear was associated with participants overestimating the size of the gaps and underestimating their ability to step over them. Moreover, the magnitude of the respective over- and underestimations increased as height increased.

Fig. 14.3
figure 3

View of a gap from the 15 m height. Participants stood at the edge of the brick surface and judged whether they could step to the other brick area. Reprinted from Geuss et al. (2016)

Recent work by our group using AR has also shown an effect of virtual depth on judgments of whether a gap can be crossed (see Fig. 14.1), with deeper gaps leading to more conservative estimates for gap crossing (Wu et al., 2019). Other fearful manipulations that could affect affordance judgments are more easily implemented with VR. For example, Regia-Corte et al. (2010) found participants were more conservative about whether they could stand on a slanted surface in an IVE when it was depicted as icy compared to wooden. With regard to personal space, virtual environments can be used to evoke fear by asking participants to place self-avatar hands with different visual appearances near dangerous things such as a spinning saw or to move the hands across dangerous things (like barbed wire) as measured with self-reports of ownership of the virtual hands (Argelaguet et al., 2016).

Immersive virtual environments have been used to study risky behaviors in children as well as adults. In her early work, Plumert (1995) related children’s overestimation of their action capabilities to their accident proneness. Further work showed evidence for a relationship between misperception of abilities in children and risky behavior. Using IVEs, Plumert extended her work to more real-world tasks, such as bicycling across a busy street. For example, Plumert et al. (2004) used an immersive bicycling simulator to compare the street-crossing behavior of 10-year-olds, 12-year-olds, and college-aged adults. Although all participants selected the same gaps in traffic during which to cross the street, the 10- and 12-year-old participants took longer to begin crossing and to reach the other side. Potential explanations for the age-related differences include a mismatch between the child participants’ perceived abilities to safely cross the road within the gap in traffic and their perceptions of how quickly the cars would approach the crossing line. In follow-up studies that examined crossing behaviors after experience (especially with high-density traffic), 10-year-old children saw the most improvement in crossing decisions, but all age groups showed better decisions in terms of safely timing crossing behaviors (Plumert et al., 2011).

O’Neal et al. (2018) also examined crossing on foot (rather than a bicycle) in this virtual task and found that improvement in crossing decisions (i.e., less risky behavior) developed with age as well. Recent work in our lab has investigated the perception of stepping over a gap in children, teenagers, and adults in IVEs (Creem-Regehr et al., 2019). Creem-Regehr et al. (2019) presented participants with gaps of various widths in an IVE, at both ground level and elevated 15 m above the ground. Participants provided “yes/no” responses to whether they could step across the gaps. Consistent with previous findings in adults, all age groups underestimated their gap-crossing abilities when elevated off the ground compared to when they made their judgments on the ground. In general, though, children underestimated their perceived gap-crossing abilities more than teenagers and adults at the height, suggesting that an increased sense of risk in the children altered their decisions more than teenagers or adults. For a more extensive review of development of the perception-action system in children and adolescents, see Plumert (2018).

Conclusions, Applications, and Future Directions

In this chapter, we have reviewed literature that argues that bodily states, whether physical or emotional, influence perceiving and acting. These effects occur in the real world and in virtual and mixed realities. Moreover, mixed reality technologies are unique and useful tools for manipulating the visual body and testing its effect on perception and action. These findings lay an important foundation going forward for experiments assessing perception, action, and embodiment in mixed realities. We would be remiss, however, if we did not discuss critiques that have surfaced regarding embodiment and perception and action. As we stated at the beginning of this chapter, one of the hardest aspects of conducting research on visual perception is simply that it is challenging for participants to accurately and reliably report what they see. As soon as participants form an overt response to describe their perceptual experience, an opportunity arises for cognition to potentially interfere with the perceptual experience. In other words, can we be sure that reports of perceptual experiences are purely based on perception and not on cognition? The short answer to this question, we believe, is no.

Some researchers have refuted claims that perception is being altered by embodied information in favor of an explanation that suggests that only reports are changed, not perception (Durgin et al., 2009, 2011; Woods et al., 2009). Recent reviews of the literature can speak more to this debate and to potential pitfalls that researchers conducting embodied cognition and perception work might consider in terms of experimental design and strength of claims (Firestone & Scholl, 2016; Philbeck & Witt, 2015). For the purposes of this chapter, we argue that determining whether embodied effects are purely perceptual or partly cognitive in nature is impossible and unnecessary. Clearly, the evidence presented suggests that, at times, we use body-based information to alter perceptual judgments. These judgments are the basis for decisions about future action, so whether the underlying change is purely perceptual or cognitively biased may not matter for behaviors that we care most about (estimates of and actions in space). Further, the pioneering work on perceiving affordances and judgments of action capabilities has been less controversial; perceptual researchers generally agree on the use of body-based cues such as eye height, leg length, and hand size to scale the environment in terms of actions (see also Witt & Riley, 2014 for a review). Thus, while we take the criticisms seriously and readily concede that some of the observed effects may be biased responses rather than changes to underlying perceptual representations, we nevertheless believe that they are important and useful to consider for many spatial judgments and behaviors. In the following section, we suggest how understanding such effects may be even more useful for certain real-world applications.

Applications

Understanding how the body influences perception and action in both real and virtual environments has implications for applications in domains such as health, training, education, and design. In the real world, the body is always present and is generally within view. However, the inclusion of visual representations of the body in virtual reality is fairly recent. Therefore, the use of embodied or body-based information to aid in training or learning for particular applications is just beginning to develop. One of the most prolific areas thus far has been the health domain. For instance, bodily perception is especially relevant to the treatment of clinical conditions. When in pain, seeing one’s body (e.g., one’s hand) is analgesic (Longo et al., 2009), and this analgesia is modulated by portraying the hand as enlarged or reduced in size through the use of mirrors (Mancini et al., 2011). Although this work was conducted in the real world, doctors/health care providers are increasingly using virtual environments for effective pain management (Kenney & Milling, 2016; Malloy & Milling, 2010), and the analgesic effects of immersive virtual environments can be enhanced by including a virtual body (Martini et al., 2014). When an avatar is present in an IVE, especially when the participant self-identifies with the avatar, pain responses are reduced (Romano & Maravita, 2014). Further, physiological responses to painful stimuli seem to be modulated by the size and orientation of the presented body (Romano et al., 2016). IVEs are also being used to help improve body image distortions in patients with Anorexia Nervosa (AN). Providing patients with a full-body illusion may change the way they perceive their environment, which provides therapeutic benefits (Keizer et al., 2016). For a comprehensive review of the use of virtual reality with body image and eating disorders, see Ferrer-Garcia and Gutierrez-Maldonado (2012). Outside of body-specific health domains, VR is effective in treating post-traumatic stress disorder (PTSD) and specific phobias (Maples-Keller et al., 2017).

In addition to its therapeutic uses, virtual reality can improve training in healthcare and other settings. Virtual reality is a training tool in various surgical arenas from orthopedics (Aim et al., 2016) to neurosurgery (Alaraj et al., 2011) to laparoscopic surgery (Alaker et al., 2016; Gurusamy et al., 2008; Yiannakopoulou et al., 2015). Medical simulations and training in VR have been applied in clinical training and assessment for medical (Matzke et al., 2017) and dental school students (Dutã et al., 2011). In many medical settings, VR holds the promise of improving the quality of medical education and offers students and clinical practitioners the ability to practice skills that are difficult, expensive, or consequential to practice in the real world. VR training also has utility in scenarios outside of healthcare and has been used in education in the fields of architecture and construction (Wang et al., 2018). The applications of virtual reality for training and development are far-reaching and have been used in domains such as automotive development (Lawson et al., 2016), manufacturing (Choi et al., 2015), sports (Neumann et al., 2018), education (Freina & Ott, 2015), and military training (Lele, 2013; Pallavicini et al., 2015). For a more thorough review of applications than we can offer here, see Slater and Sanchez-Vives (2016). Accurate perception of space and action capabilities in virtual environments used for training in any of these domains is clearly crucial.

Future Directions

The last 10 years have seen a marked increase in the technologies available to researchers of perception and action for testing theories of embodiment. Virtual and augmented reality have provided unique capabilities for displaying the body and testing its effects on perception of and action in the surrounding environment. Future work will be able to capitalize on even better body-scanning technologies that allow for easy and quick self-avatar creation (see Pujades et al., 2019 for an example of such capabilities). We foresee such technologies being used to test not only embodiment and its relation to perception and action but also more existential questions of how body size and shape may play a role in the sense of self. Certainly, the existing research suggests that body-based information contributes to perceptions of and actions in our environments. More work is needed wherein full-body information is manipulated with self-avatars (instead of simple arm or feet manipulations) to test reliance on the visual body to scale the environment. Manipulations of proprioception may also become possible with the development of better haptic devices for virtual reality (Garcia-Valle et al., 2017). More investigation of the use of body-based information for perception and action across the lifespan will also be important. We present some research on children and adolescents, but the new lighter-weight HMDs could also allow for testing older adults. Overall, we believe that virtual and other mixed reality technologies will be essential for pushing theories of embodiment in perception and action forward, and we encourage readers to contribute to such efforts.