Introduction

Peripersonal space (PPS) is the multimodal sensory–motor interface that mediates the interaction between an individual and their environment, which generally corresponds to the space around that individual’s body. Numerous studies in non-human primates have shown that multisensory cues, specifically those recruiting the body through touch, are integrated by a specialized neural system representing PPS. Specific populations of multisensory neurons respond to both tactile information on the body (i.e., the arm, face or trunk) and visual or auditory stimuli occurring in PPS (i.e., close to the body). These multisensory neurons were first described in the macaque brain, in a network composed of specialized parietal and frontal areas, such as the ventral premotor cortex (vPM, Rizzolatti et al. 1981; or polysensory zone, Graziano et al. 1999), the ventral intraparietal area on the fundus of the intraparietal sulcus (VIP, Duhamel et al. 1997; Duhamel et al. 1998), and parietal area 7b, as well as subcortical regions such as the putamen (Graziano and Gross 1993; see Grivaz et al. 2017; di Pellegrino and Làdavas 2015, for reviews).

PPS representations serve to encode the location of nearby sensory stimuli to generate suitable motor acts, such as goal-directed, approaching actions toward objects () or involuntary, defensive/avoidant reactions in response to close threats (Graziano et al. 2002). In fact, neural and behavioral responses to approaching stimuli increase as a function of the vicinity of the stimulus to the body, the so-called proximity effect (Bufacchi and Iannetti 2018; Cléry et al. 2014; Van der Stoep et al. 2015). In addition to proximity to the body, several other factors affect PPS representation including stimulus movement parameters, such as direction and speed. Regarding direction, the majority of bimodal neurons in the VIP respond more than twice as much to stimuli moving in a preferred direction compared with a non-preferred direction (Colby et al. 1993), even when the responses are elicited by identical visual stimuli. Regarding speed, the firing rate of a portion of these neurons in VIP increases as function of the velocity of the looming stimulus, suggesting that they might be computing the time to impact on the body (Fogassi et al. 1996). The influence of speed on PPS representation has also been observed behaviorally, as the velocity of looming audio stimuli has been shown to dynamically resize PPS (Noel et al. 2018).

Another important factor influencing PPS representation is the salience of the approaching stimulus. Stimuli that are behaviorally relevant for actions aiming to create or avoid contact between the stimulus and the body modulate the proximity effect. For example, the proximity effect is enhanced by an approaching threat (e.g., a spider; de Haan et al. 2016). Within the realm of salient threatening stimuli, fearful facial expressions are a particular kind of threatening stimulus that does not constitute a direct danger (as did the approaching spider used by de Haan et al. 2016, or the angry faces used by Cartaud et al. 2018, and Ruggiero et al. 2017), but, rather, communicates a potential environmental risk whose source and location are unknown (Fanselow and Pennington 2018). Thus, if one fundamental element that triggers the chain of transformations required for defensive purposes is the capability to read threat signals in the environment, we should find that a fearful facial expression, but not a joyful one, modulates the proximity effect.

The capability to read threat signals in the environment and trigger appropriate behavioral responses is supported by neural circuitries involving sub-cortical and cortical structures, in strict connection with the autonomic nervous system. In particular, the amygdala plays a crucial part in emotion-related processes (Öhman 2005) and is involved in modulating autonomic nervous system responses to threat (Gläscher et al. 2003; Phelps et al. 2001; Laine et al. 2009), such as the skin conductance response (SCR) (Wang et al. 2018). In this regard, fearful faces seem to elicit a robust skin conductance response and amygdala activation (Anderson et al. 2003; Britton et al. 2008; Hariri and Tessitore 2002; Cushing et al. 2018).

Here, we investigated the role of approaching emotional facial expressions (fearful and joyful) in modulating autonomic arousal as a function of the distance of the faces from the observer. To this aim, healthy subjects underwent a PPS task (responding to tactile stimuli delivered to the cheeks) while their SCR was recorded, and they watched task-irrelevant fearful or joyful faces approaching them from very far to near space in an immersive virtual environment. Neutral faces were also administered to control for the effects of stimulus movement parameters, such as speed and stimulus size, which are known to influence proximity effects.

Note that the impact of emotional faces on proximity effects was previously addressed by Cartaud et al. (2018), who demonstrated that an angry avatar elicited a stronger physiological activation than joyful or neutral avatars when it was presented within reaching distance (at 65 cm), but not outside of reaching distance (at 250 cm). In their study, PPS was conceptualized as an in-or-out space, assumed to yield a discrete response. However, the PPS representation is based on a sequence of graded rather than discrete receptive fields (Bufacchi and Iannetti 2018). Thus, we wondered whether the modulation of arousal by spatial proximity may be gradual rather than discrete. For this reason, in this study, the face approached participants from three different spatial distances, namely Ultra-far, Far and Near, and we expected a gradual modulation of SCR as a function of these distances. Moreover, Cartaud et al. (2018) explicitly asked participants to consider the spatial position of the emotional avatar (in the reachability judgment and interpersonal comfort distance tasks), possibly tapping into more cognitive processes. Here, participants were not required to make any such estimations, enabling us to investigate the effect of space in an implicit way. Furthermore, in the present study, we investigated the effect of fearful faces on PPS, which, as discussed before, have different characteristics than angry faces.

To quantify the modulation of autonomic arousal by fearful faces as a function of their distance from the observer—while controlling for confounding stimulus movement parameters, such as speed and stimulus size—we subtracted the mean SCR elicited by fearful and joyful faces from that elicited by neutral faces and then contrasted fearful SCR indices with joyful SCR indices. Thus, any difference between joyful and fearful faces would reflect the relative enhancements in arousal elicited by those emotions, compared to the arousal elicited by the neutral face, as a function of distance from the participant. More precisely, we expected that approaching fearful faces, by signaling an upcoming environmental threat, would elicit a gradual increase in SCR as the face become closer to the participant. In contrast, we did not expect approaching joyful faces to increase SCR magnitude.

Materials and methods

Participants

Twenty-seven healthy participants with no history of neurological or psychiatric disorders were recruited (17 females; mean age ± SD = 25 ± 2.5 years). This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Bioethics Committee of the University of Bologna (Date 8-8-2019 /No. 178302). All participants gave informed written consent to participate after being informed about the experiment. The sample size was determined via a power analysis conducted in G*Power 3.1 software and based on the mean effect size from prior studies on PPS and SCR responses (Cartaud et al. 2018; Rossetti et al. 2015), with an alpha of 0.05 and a desired power of 0.9.

Stimuli and materials

The experiment was implemented in ExpyVR software (a framework for designing and running experiments in virtual reality, available online at https://lnco.epfl.ch/) and run on a Windows PC (XPS 8930, Dell, Round Rock, Texas, USA). The tactile stimuli consisted of vibrations delivered bilaterally to the participants cheeks by a pair of shaftless vibration motors (Precision MicroDrives, model 312-101, 3 V, 60 mA, 150 Hz, 5 g). Each motor had a surface area of 113 mm2 and reached maximal rotation speed in 50 ms. The devices were activated for 100 ms during tactile stimulation.

The visual stimuli were avatar faces showing a fearful, joyful or a neutral expression and were presented by relaying to the head mounted display (HMD, Oculus Rift SDK, Oculus VR, 100° field of view, 60 Hz). Stereoscopic vision was obtained by projecting the stimulus in a slightly different angle to the left and right eye (for more details see https://developer.oculus.com/design/bp-vision/). The angular size, which is the size of the image that an object produces on the retina of the observer, was not corrected, thus, far faces were perceived as smaller than closer faces.

The avatar emotional facial expressions were manipulated ad hoc to render the desired features with Poser software (vers. 10; Smith Micro Software, Aliso Viejo, California, USA). Stimuli implemented in the study were chosen through a validation procedure (see “Visual stimulus validation”).

At T0 (see Fig. 1), the beginning of each trial, a black fixation dot appeared centrally in the participant’s visual field, on a gray background, for 500 ms, at an apparent distance of 400 cm from the participant. At T1, an avatar face with a neutral, fearful or joyful expression appeared centrally in the visual field, in one of three different positions: Near space (~ 70 cm away), Far space (~ 210 cm away) or Ultra-far space (~ 350 cm away) from the participant (see Fig. 2). Faces moved toward the participant on the sagittal plane for a total of 3000 ms. The end point of the looming face was always fixed near the participant (~ 10 cm away), where the face remained still for 1000 ms before stimulus offset. Therefore, stimuli in each condition covered different lengths of space in the same amount of time, resulting in different traveling speeds: 20 cm/s, 66.7 cm/s and 113.3 cm/s, for the Near, Far and Ultra-Far conditions, respectively. At T2, 1500 ms after the presentation of the face, the tactile stimulus was delivered. Thus, touch coincided with perception of the face at different distances from the participant (Serino et al. 2018; 40 cm in the Near condition, 110 cm in the Far condition and 180 cm in the Ultra-far condition). Lastly, at T3, at the face offset, the fixation dot reappeared, at the previous location, for 500 ms. Note that, in the 15% of trials, the color of the fixation dot changed from black to red at T3. Participants were asked to detect the color change and signal it to the experimenter. The change in fixation dot color always happened at the end of the trial, when the face disappeared. The inter-trial interval (ITI) was a gray empty environment, with a variable duration ranging from 11 to 14 s (± 1 s of jitter).

Fig. 1
figure 1

Experimental timeline. At T0, the fixation dot (black) appeared for 500 ms. At T1, the face moved for 3000 ms toward a location near the participant, where it remained still for 1000 ms (T3). At T2, tactile stimulation was delivered. At T3, the face disappeared and the fixation dot (black/red) re-appeared for 500 ms. The ITI was set at 11–14 s

Fig. 2
figure 2

Spatial conditions. In each spatial condition, the end point was fixed at a location near the participant (10 cm), while the starting point differed, resulting in a distance from the participant of approximatively 350 cm in the Ultra-far condition, 210 cm in the Far condition and 70 cm in the Near condition. At T2, when tactile stimulation was delivered, the face appeared to be 180 cm away in the Ultra-far condition, 110 cm away in the Far condition and 40 cm away in the Near condition. The face was always displayed for 4000 ms (from T1 to T3)

This design allowed us to exclude a potential confounding effect of temporal expectation on tactile facilitation, since the tactile stimulation was always delivered with the same delay after the appearance of the face in each spatial condition. In fact, when a moving object approaches the body, it does not only trigger the multisensory PPS neurons that influence tactile processing, but also the impending contact with the approaching object creates an expectation of an upcoming tactile event that influences the response time to the tactile stimuli. Also, the expectation increases as time elapses and it approached the body (Kandula et al. 2017).

Visual stimulus validation

To select the faces to be included in the experiment, 60 naive participants (30 females; mean age ± SD = 29 ± 10 years) were instructed to rate 15 two-dimensional pictures constituting 5 different versions of each facial expression, namely joyful, fearful or neutral. Participants had to indicate which emotion was represented in the picture and, subsequently, to rate how strongly that emotion was expressed on a 10-point Likert scale (0 = low intensity; 9 = high intensity). They also had to rate the arousal level generated by each stimulus on a 10-point Likert scale (0 = not at all arousing; 9 = extremely arousing).

This procedure allowed us to select two joyful, two fearful and two neutral facial expressions, according to the highest percentage of participants who correctly identified the emotion in the picture, then the highest perceived intensity level and the highest perceived arousal. The mean hit rate of the selected stimuli was 95%, for the joyful faces and 80% for the fearful and neutral faces. To check whether the mean ratings of intensity and arousal were significantly different between the emotions, repeated measures analyses of variance (ANOVAs) were conducted with mean intensity and mean arousal scores. The analysis of intensity level showed that ratings were different across emotions [F(2,118) = 151.45; p < 0.01; ηp2 = 0.72]. Post hoc Bonferroni tests showed that both joyful and fearful expressions were judged as more intense than the neutral expressions (Neutral faces: M = 2.39, SEM = 2.05; Joyful faces: M = 5.62, SEM = 1.70; Fearful faces: M = 7.12, SEM = 1.38; all p < 0.01); moreover, fearful expressions were judged as more intense than the joyful expressions (p < 0.01). The analysis of arousal level also showed that ratings were different across emotions [F(2,118) = 98.35; p < 0.01; ηp2 = 0.63]. Post hoc Bonferroni tests showed that both joyful and fearful expressions were judged as more arousing than the neutral expressions (Neutral faces: M = 1.53, SEM = 1.54; Joyful faces: M = 3.89, SEM = 2.17; Fearful faces: M = 5.08, SEM = 2.32; all p < 0.01); moreover, fearful expressions were judged as more arousing than the joyful expressions (p < 0.01).

Task and procedure

There was a total of 27 trials, evenly distributed among the 9 experimental conditions defined by facial expression (Neutral/Fearful/Joyful) and spatial position (Ultra-far/Far/Near; i.e., 3 trials per condition). The number of repetitions per condition was kept low, due to the fast decay of the SCR to a stimulus presented repeatedly (i.e., the habituation phenomenon; Bradley et al. 1993). Trial order was randomized. After signing the consent form, participants sat on a comfortable chair in a sound-attenuated room. Vibrators were then attached bilaterally on the cheeks with medical tape, and a virtual reality headset was mounted onto the head of the participant. Before the task began, the lens focus of the Oculus VR was manually adjusted by each participant until clear vision was reported and the SCR activity recording was verified. During the task, participants made speeded simple responses to the tactile stimulation by pressing a button placed on the table in front of them with their right hand.

At the end of the experimental phase, participants were invited to fill out a form in which they were asked to recognize the emotions represented in VR and to rate their intensity and arousal levels with two separate 10-point Likert scales. For intensity, the anchors were 0 (mild-neutral) to 9 (very intense), and, for arousal, they were 0 (not exciting at all-relaxing) to 9 (highly arousing-exciting). Moreover, participants were invited to rate the pleasantness of their general experience in the VR environment with a 10-point Likert scale that ranged from 0 (not pleasant at all) to 9 (very pleasant).

SCR recording and data processing

SCR was recorded with a Biopac MP-150 (BIOPAC Systems, Inc., Goleta, California, USA) at a 200-Hz sampling rate, and collected with AcqKnowledge 3.9 software (BIOPAC Systems) for offline analysis. SCR was acquired with two Ag/AgCl electrodes (TSD203; BIOPAC Systems) filled with isotonic hypo-saturated conductant gel and attached to the distal phalanges of the second and third fingers of the participant’s non-dominant hand. A Biopac EDA100C (BIOPAC Systems) was used to measure SCR (gain switch set to 5 μS/V, low pass to 35 Hz, high pass to DC).

SCR data were analyzed offline using MATLAB (Version R2018b; The MathWorks, Inc., Natick, Massachusetts, USA), and all statistical analyses were performed with STATISTICA (StatSoft, v. 13.0, Round Rock, Texas, USA). Each trial (see Fig. 3 as an example of single SCR traces) was extracted from the entire SCR signal and, to reduce inter-individual variability, a baseline correction was applied using the mean value of the signal 1000 ms before each stimulus presentation as a baseline (Alpers et al. 2011; Banks et al. 2012; Shiban et al. 2015). Then, for each baseline-corrected trial, the peak-to-peak value was calculated as the amplitude during the 500–4500 ms time window after emotional face onset. The minimum response criterion was 0.02 μS, and smaller responses were encoded as zero. Raw SCR scores were square root-transformed to normalize the data distribution (Boucsein et al. 2012; Schiller et al. 2008).

Fig. 3 
figure 3

Plots showing an example of single trial SCR from a single participant. Each panel reports the plot of three trials, one per each emotion condition, in the Near space (upper panel a), in the Far space (middle panel b) and in the Ultra-Far space condition (lower panel c). Lines intercepting the x-axis are delimiting the time-window chosen for the analysis (500–4500 ms after stimulus onset)

Results

Concerning the psychophysiological data, the assumption of a normal distribution of data was verified, and mixed-design ANOVAs were used to investigate modulations of arousal (SCR) during the experimental task. Post hoc analyses were conducted with Bonferroni corrections, and the significance threshold was set at p < 0.05. The effect size was calculated as partial eta-squared (ηp2). Three participants, considered SCR non-responders, were excluded from the analysis due to the minimal level of recorded responses (Boucsein et al. 2012).

To quantify the mere effect of the emotion (fear, joy, neutral) at each distance, we created an index (∆SCR) by subtracting the mean value of the phasic response to neutral faces from the phasic responses to the fearful and joyful expressions, for each distance (Ultra-far, Far, Near). Thus, ∆SCR allowed us to control for possible effects of both the stimulus speed and size. Indeed, it is important to highlight that the looming faces started at different distances from the participant, but the end point was always the same. This means that the stimuli covered different distances in the same amount of time, resulting in different travel speeds, as well as faces presented at different distances appearing in different sizes.

A repeated measures ANOVA was performed to investigate the effect of the Emotion (two levels: ∆SCR Fear, ∆SCR Joy), the effect of the Distance (three levels: Ultra-far, Far, Near) and their interaction. There was neither a main effect of the Emotion (F (1,26) = 1.25; p = 0.27; ηp2 = 0.05), nor of the Distance (F(2,52) = 2.63; p = 0.08; ηp2 = 0.09). Crucially, an Emotion*Distance interaction was found (F(2,52) = 6.76; p < 0.01; ηp2 = 0.21). Bonferroni-corrected post hoc comparisons revealed that, for the joyful faces condition, there was no difference between the Ultra-far, Far and Near conditions (∆SCR Joy Ultra-far: M = 0.00; SEM = 0.03; ∆SCR Joy Far: M = 0.03; SEM = 0.02; ∆SCR Joy Near: M = 0.01; SEM = 0.02; all p = 1). In the fearful faces condition, instead, values in the Ultra-far condition were significantly lower than values in the Far and Near conditions (∆SCR Fear Ultra-far: M =  – 0.04; SEM = 0.03; ∆SCR Fear Far: M = 0.04; SEM = 0.03; ∆SCR Fear Near: M = 0.09; SEM = 0.03; all p < 0.02). ∆SCR Fear in the Far condition did not differ from ∆SCR Fear in the Near condition (p = 0.49). Importantly, ∆SCR Fear was higher than ∆SCR Joy in the Near condition (p = 0.01; see Fig. 4).

Fig. 4 
figure 4

Bar graph showing the experimental results. In particular, the graph shows the interaction between Emotion and Distance. In the joyful faces condition, ∆SCR did not differ between spatial conditions, whereas ∆SCR for the fearful faces was significantly modulated by spatial distance. Asterisks indicate significant comparisons. Error bars represent S.E.M. Overlaid dots show the individual subjects’ data per each condition

Finally, we also analyzed the latencies of the peaks, computed as the period between the stimulus onset (T1; the appearance of the face) and the SCR maximal peak elicited by the visuo-tactile compound. Largest deflections of the SCR signal, except for one subject in one condition, were always following the time of the touch delivery (T2; 1500 ms), at latencies that were around 4130 ms on average (SEM = 60). As a sanity check, analysis on the SCR peaks, were rerun with the exclusion of the mentioned subject, and similar results were obtained. Moreover, we checked whether latencies of the peaks were modulated by our experimental conditions (Emotion and Distance). Results from the repeated measures ANOVA confirmed that latencies were not modulated by the main effect Emotion (F (2,52) = 0.67; p = 0.51; ηp2 = 0.03), nor by the main effect of Distance (F (2,52) = 0.80; p = 0.45; ηp2 = 0.03), nor by their interaction (F (4,104) = 1.03; p = 0.39; ηp2 = 0.04).

Concerning the behavioral data, all participants detected 100% of the attentional probes and were also accurate at detecting the tactile stimulus, as the rate of the omissions was low (< 1%). Due to the limited number of trials per conditions (n = 3), response times to tactile stimuli were not analyzed.

Concerning the final rating results, the totality of the subjects correctly reported the identity of the emotional faces (mean hit rate 100%). Intensity and arousal levels, rated at the end of the experimental session, were analyzed separately. A repeated measures ANOVA was used to evaluate differences in the intensity ratings of the stimuli. Results showed a main effect of Emotion (F (2,52) = 17.95; p < 0.001; Fear: M = 7.40; SEM = 0.27; Joy: M = 4.85; SEM = 0.44; Neutral: M = 4.26; SEM = 0.51). Bonferroni-corrected post hoc comparisons revealed that fearful faces were rated as more intense than joyful and neutral faces (all p < 0.01). Another repeated measures ANOVA was used to evaluate differences in the arousal ratings of the stimuli. Results showed a main effect of Emotion (F (2,52) = 6.91; p = 0.002; Fear:M = 5.44; SEM = 0.27; Joy: M = 5.11; SEM = 0.35; Neutral: M = 4.44; SEM = 0.37). Bonferroni-corrected post hoc comparisons revealed that fearful faces were rated as more arousing than neutral faces (p < 0.01) but not significantly different from joyful faces (p = 0.06). Finally, participants rated their general experience in VR as mildly pleasant (M = 6.66; SEM = 0.42).

Discussion

Multisensory neurons mapping PPS are sensitive to the spatio-temporal dynamics of objects in the environment, and it is known that stimuli related to the body (in this case, a tactile vibration) and external events that occur near the body (in this case, an approaching avatar face) are highly likely to be jointly processed (Serino 2019). The information from this joint processing is directly transferred to the motor system to prompt appropriate responses, which are positively correlated with the proximity of the visual stimulus to the touched body part. In addition to proximity to the body, several other factors affect PPS representation, including stimulus movement parameters such as direction and speed, and, more relevant to the aim of the present study, the salience of the stimulus.

In the present study, we investigated the role of the salience of approaching emotional facial expressions in modulating the autonomic nervous system as a function of their distance from the observer. Thus, the aim of the present study was to verify whether SCR—an index of transient responses of the autonomic nervous system in response to a stimulus—is differentially modulated by emotional facial expressions (fear and joy) according to how close the looming face is to the participant. We predicted a modulatory effect only for stimuli with high salience and importance to the individual, like a fearful face, which signals the presence of an unknown threat in the environment. This effect was expected to gradually increase as the visual stimulus approached the participant, i.e., when the source of threat may be inescapable, and the need for defense is most pressing. In light of the defensive purpose of peripersonal space, joyful faces—which have low salience and little importance for an individual’s defense and avoidance behavior—should not modulate the proximity effects.

To this aim, we created a novel version of a well-validated behavioral task used to assess PPS (Pellencin et al. 2018; Serino et al. 2015). In this task, participants were asked to respond, as quickly as they could, to tactile stimuli administered on their cheeks while an emotional or neutral face appeared to approach them from three different distances (Ultra-far, Far and Near). To eliminate the time expectancy effect, which is known to influence the proximity effect (Kandula et al. 2017), tactile stimulation was always delivered 1500 ms after the beginning of the trial, so that touch coincided with perception of the faces at different distances from the participant. To quantify the pure modulatory effects of the emotions on the proximity effect, we subtracted the mean value of the phasic SCR response to the neutral faces from the phasic responses to the fearful and joyful faces, at each distance condition. This correction returned an index of the relative arousal response enhancement due to the presentation of emotional faces, compared to the presentation of the neutral, and allowed us to control for confounding stimulus parameters, such as speed (fast vs slow movement) and stimulus size (big vs small faces). Previous literature has shown that speed of traveling affects PPS (Fogassi et al. 1996; Noel et al. 2018), in particular, as the velocity of incoming visual stimulus increases, the size of the receptive fields of multisensory neurons also increases, as if to initiate the computation for PPS representation earlier and integrate the speed of the incoming stimuli (Fogassi et al. 1996). Consequently, using such index we did not expect an absolute main effect of the distance, but, instead, we predicted that only in the fearful condition, responses would be relatively modulated as the fearful face was perceived as closer to the participant. The results confirmed our predictions; approaching fearful faces triggered particularly intense emotional responses which depended on the distance between the stimulus and the observer. Approaching fearful faces, but not joyful faces, elicited a gradual increase in SCR magnitude as the face became closer to the observer. Greater physiological responses to fearful faces were obtained in the Near condition (~ 40 cm away) compared to the Far (~ 110 cm away) and Ultra-far conditions (~ 180 cm away). Distance did not modulate the physiological responses to joyful faces.

The difference in the physiological response to fearful faces, on the one hand, and joyful faces, on the other, is not surprising if we consider that the stimuli are not equally salient and have different impacts on motor corticospinal excitability (Schutter et al. 2008). Given the sensory–motor function of PPS to protect the body from potentially dangerous stimuli, it is not surprising that PPS is influenced by the salience of stimuli and by their differential impacts on the motor system. These two factors will be discussed in turn, in the following paragraphs.

Regarding stimulus salience, joyful and neutral faces have very little relevance to threat detection, compared to fearful faces, and they are probably unable to activate the emotional neural circuits, involving the amygdala, which play an important role in evaluating stimulus salience and generating physiological responses, such as SCR. In addition, previous studies have demonstrated that the amygdala shows greater activation when a stimulus is presented in ambiguous and uncertain environmental circumstances, in the presence of ambiguous threat (Adams and Kleck 2003) or during an unpredictable series of auditory tones (Herry et al. 2007). A fearful face, unlike other negative emotions such as anger, signals an environmental threat whose source and location are unknown (Fanselow and Pennington 2018) and, as such, it can be conceived of as an ambiguous stimulus (Hortensius et al. 2016). Consequently, after fearful face presentation, enhanced amygdala-mediated vigilance and arousal are necessary for scanning the environment and dealing with the uncertainty of the upcoming danger. Thus, the gradual increase in SCR magnitude found in the present study, as the fearful face approached the participant, could be explained by the greater amount of attentional resources required to search for the source and location of the threat that generated that fearful expression.

Regarding the differential impacts of fearful and joyful expressions on the motor system, previous research has demonstrated that joyful and neutral scenarios, unlike threatening scenarios, do not selectively induce an early increase in motor corticospinal excitability, suggesting a lack of action preparedness when the participant is confronted with these emotions. In contrast, an early modulation of the motor cortex has been found when participants face threatening scenarios (Borgomaneri et al. 2014). The same results have been obtained with emotional faces: a selective impact on the motor system was found for fearful faces, but not for neutral or joyful faces (Schutter et al. 2008). These results show that the emotional system and the motor system are closely related, and fearful faces, but not neutral or joyful faces, act as cues that rapidly prepare the organism for action critical to survival (Anderson and Phelps 2001). This observation is particularly relevant for PPS, which has been conceived of as a sensory–motor interface for body protection.

The results of the present study are in line with neurophysiological findings (see Bufacchi and Iannetti 2018; Colby et al. 1993; Graziano et al. 1997, for a review) showing that peripersonal space seems to reflect a relevant area in which the salience of the stimulus interacts with the distance between the stimulus and the observer. The perceived salience of an emotional expression gradually increases as the face approaches the observer, as documented by the gradual increase in physiological activation from Ultra-far to Far and then to Near space; it is worth remembering that the more the face expresses fear, the higher the SCR in the observer (Alpers et al. 2011; Fusar-Poli et al. 2009). Given the sensory–motor functions of PPS, a fearful face would enhance the defensive function of PPS specifically when it is most needed, i.e., when the source of threat is nearby, and its location has not yet been identified. We cannot exclude that also the valence of the emotional expression, a construct that refers to its pleasantness or unpleasantness (Kensinger and Schacter 2006), may have played a role in determining our results. In fact, fearful faces, which are carrying important information about presence of threats in the environment, are not only more salient stimuli than joyful faces, but have also more negative valence. This aspect needs to be clarified by tailored future studies.

We do not know whether the results of the present study can be extended to other definition of PPS, i.e., action-based peripersonal space (APS), defined as the space within which we can act (~ 70 cm), and interpersonal space (IPS), defined as the space within which any intrusion by others may cause discomfort. These PPS definitions seem to be based on different mechanisms, since it has been shown that they are differentially sensitive to social modulation (Patané et al. 2016; see also Coello and Iachini 2015; Iachini et al. 2014). On the other hand, those studies on the APS and IPS relied on explicit processing, which may tap into more cognitive processes. In the present study, we used a multisensory integration paradigm where the visual stimulus producing the effect was irrelevant to the tactile detection task. In such a task, processing may be based on bottom-up factors and might tap into the defensive motor system promptly by asking for a binary response (action or no action; de Gelder et al. 2012). In addition, neurophysiological studies in monkeys have shown a functional dissociation between multisensory PPS neurons in the premotor cortex (VIP and F4) and the reaching neurons in the parietal lobe (MIP/Parietal reaching region and F2; see, e.g., Grefkes and Fink 2005; Matelli and Luppino 2001; Rizzolatti et al. 1997; Rizzolatti et al. 2002). A similar dissociation is evident in humans (see, e.g., Gallivan and Culham 2015; Grivaz et al. 2017). Considering the functional dissociations between the different types of PPS, we might expect different results when other paradigms, relying on different neural circuits, are used. Further investigation is needed to clarify this point.

Thus, the results of the present study confirm the defensive functional definition of peripersonal space; they show that fearful facial expressions are physiological salient cues whose activation of the autonomic system depends upon the region of space where they are perceived. In other words, the salience of the face changes with its proximity to the body; an approaching fearful face, by signaling an upcoming environmental threat, elicits a gradual increase in SCR as the face comes closer to the participant, where the source of threat may be inescapable and the need for defense is most pressing.