Introduction

Emotions comprise interrelated subjective, behavioral, and physiological components. They provide people with vital information about their internal state and goals (Clore et al. 2001; Storbeck and Clore 2008) and communicate this information to others (Keltner and Gross 1999; Hess et al. 2016). Emotion perception is the process by which people perceive the emotional states of others (Barrett and Kensinger 2010). Because emotion perception is an integral precursor to more complex social-emotional processes, such as empathy, helping behaviors, and emotion regulation, it is important to understand how people arrive at these perceptions.

One way in which emotional states are communicated is through perceiving the downstream physiological changes that occur as a result of people responding to environmental demands. The central goal of the current study was to examine whether people’s autonomic nervous system (ANS) responses to a stressor are associated with how others perceive their emotions. Physiological changes may be communicated via facial expressions, vocal characteristics, and body language. To build upon extant literature that has focused on facial expressions within the emotion recognition literature (e.g., Cordaro et al. 2018; Elfenbein and Ambady 2002; Russell et al. 2003), the second goal of our study was to examine less studied nonverbal signals, including vocal characteristics and body language, as potential mediators of the association between targets’ stress physiology and observers’ perceptions of their emotions.

When a target experiences emotion, there are expressive signals which are emitted for observers to pick up on (Bänziger et al. 2009; Keltner et al. 2016), including changes in behavior, body postures (Hess 2017), facial expression (Ekman et al. 2002), and voice (Juslin and Scherer 2005). Presumably, observers making judgments about others’ emotions should gather and use affective information from the multiple channels that comprise emotional responses. Such a perspective is supported by recent theorizing in emotion recognition that proposes that the signal value of several channels may better inform the emotion perception process (Keltner and Cordaro 2017). In the current study, we examine how ANS changes relate to observers’ perceptions and other established affective signals (e.g., voice and body language).

Importantly, changes in nonverbal signals can be traced to changes in the ANS, yet, surprisingly, few investigations of emotion perception have included targets’ ANS responses as a valuable signal. Given physiology is central to social stress (Campbell and Ehlert 2012), these paradigms provide an opportunity to further understanding of the emotion perception process. Although emotion perception is ubiquitous in everyday (i.e., non-stressful) contexts (e.g., Hess et al. 2016), the intrinsic connection between social stress contexts and physiology make these contexts an ideal starting place for understanding the signal value of ANS changes. Thus, we examined the extent to which three ANS indices, respiratory sinus arrhythmia (RSA), cardiac output (CO), and ventricular contractility (VC), in targets completing a socially evaluative stress task (Trier Social Stress Test [TSST]; Kirschbaum et al. 1993), correlate with observers’ ratings of the targets’ affective state. The chosen indices represent moderately correlated ANS activity during stress and have been previously linked to distinct affective states (Seery 2013; Blascovich and Mendes 2010; Muhtadie et al. 2015; Bliss-Moreau et al. 2013; Weisbuch et al. 2009).

RSA reflects variability in an individual’s heart rate over time as a function of inhalation and expiration. It is conceptualized as a relatively ‘pure’ index of parasympathetic nervous system (PNS) activity and is primarily influenced by the vagus nerve (Porges et al. 1994). Importantly, we focus on physiological reactivity (i.e., changes in physiology during a stress task compared to a resting baseline), as opposed to tonic measures of physiology (i.e., resting or “baseline” measures of physiology), which have different psychosocial correlates. During laboratory tasks or stressful situations, RSA typically decreases (i.e., Porges et al. 1996). Unlike the relatively unambiguous benefits of higher vagal tone (Thayer et al. 2012), the vagal reactivity literature suggests possible benefits and drawbacks. For example, greater decreases in RSA (i.e., greater vagal withdrawal) during an attention task are associated with better social and emotional sensitivity (Muhtadie et al. 2015), more accurate person perception (Human and Mendes 2018), and greater mental effort and attentional control (Porges et al. 1996). In contrast, decreases in RSA have also been linked to experiences of anger and distress (Demaree and Everhart 2004). Thus, during a stressful task, targets’ RSA reactivity (i.e., decreases in RSA from baseline) could be perceived as (1) greater engagement and effort leading to perceptions of greater PA, or conversely, (2) indexing more distress and anger, leading to perceptions of greater NA.

CO measures how efficient the heart is working in responses to the demands of one’s environment and is influenced by both the PNS and sympathetic nervous system (SNS). CO has been implicated in beneficial health outcomes such as decelerated brain aging (Jefferson 2010), as well as affective states such as challenge and threat (Blascovich and Mendes 2010; Seery 2013). Challenge is an affective state marked by cognitive appraisal that one’s resources are sufficient for the demands of their task/environment and is characterized with a pattern of physiological reactivity that includes increased SNS activity (measured with pre-ejection period), increased cardiac output and decreased vascular resistance (i.e., vasodilation; Blascovich and Mendes 2010). Conversely, threat is an affective state marked by the cognitive appraisal that one’s resources are insufficient for the demands of the task/environment, and like challenge, is characterized by increases in SNS activity, but less cardiac output and increased vascular resistance (i.e., vasoconstriction). In our study, targets were provided with nonverbal feedback during a social stressor designed to elicit challenge and threat states (Kassam et al. 2009; Koslov et al. 2011; Kubzansky et al. 2012). In line with physiological and affective indicators of challenge and threat, we predicted that CO increases would be associated with perceptions of higher PA, but unrelated to perceptions of NA.

We additionally examined ventricle contractility (VC), which is solely innervated by SNS and reflects the time difference between when the left ventricle contracts and the aortic valve opens (i.e., pre-ejection period [PEP]; PEP was multiplied by − 1 and re-labeled VC so that increases in SNS activity are represented as increases in VC). VC increases have most consistently been associated with approach-oriented affective states like anger and action-readiness (Herrald and Tomaka 2002; Kassam et al. 2009; Mendes et al. 2008). VC has also been found to increase in response to anger elicitation, for example, among people who report less (versus more) habitual use of cognitive reappraisal (Mauss et al. 2007). VC tends to increase with both challenge and threat states (Blascovich and Mendes 2010), so we tentatively predict that VC might be related to NA to the extent that NA was reflective of anger or approach-oriented responses.

It is important to elucidate how targets’ physiological reactivity may help to shape observers’ affective judgments and how this source of information (i.e., ANS changes) is intricately linked to observable behavior (i.e., how it might be indirectly observed). Thus, we explored several non-conscious but observable behavioral mediators that correspond to or result directly from ANS changes. Specifically, observers rated targets’ somatic activity (i.e., bodily movement associated with physiological responding), gesture intensity (i.e., how much targets nodded and used hand gestures while speaking), and speech clarity (i.e., how clear the target spoke). We also assessed the targets’ vocal pitch (i.e., fundamental frequency), which is shaped by physiological processes (i.e., respiration, vibration of the vocal cords, movements of the mouth and lips; Kappas et al. 1991) and is directly perceivable for observers (i.e., through audition). Due to the shared influence of the vagus nerve on RSA and vocal apparatus (Porges 2001), emotional prosody may give clues about targets’ physiological functioning, in turn helping observers make emotion judgments. Inclusion of these potential behavioral mediators allowed us to examine several bodily cues that are rooted in physiological responses but directly observable to better understand how physiological responding may leak out as information for observers.

In sum, we predicted that PA will be positively associated with CO reactivity. To the extent that task engagement and effort was perceived as PA, we also expected that PA perceptions would be negatively associated with RSA reactivity. We predicted perceptions of NA would also, but to a lesser magnitude than PA, be negatively associated with RSA reactivity and unrelated to CO reactivity. Finally, although there were data in the literature that could reasonably lead to predictions for NA or PA to be associated with VC reactivity, these analyses were exploratory. We used mediation analyses to explore whether observers’ perceptions of the targets’ observable behaviors (i.e., somatic activity, gestural intensity, speech clarity) and the fundamental frequency of the targets’ voice mediate the associations between observers’ emotion perceptions and targets’ physiology.

Method

Participants and Procedures

The sample of participants consisted of 94 (68.4% female) undergraduate students attending a private Midwestern university recruited for participation in a larger study (see Eckland et al. 2018). The racial/ethnic make-up of the sample was as follows: 63.2% European American, 7.4% African American, 32.6% Asian-American, 5.3% Latino American, and 4.3% indicated “other.” Their ages ranged from 17 to 23 (M = 19.34 years, SD = 1.11).

Each participant completed an individual laboratory session that included a video rating task and self-report measures; those relevant to the current study are described below.Footnote 1 Informed consent was obtained from all individual participants in this study. The procedure was approved by the university’s internal review board, and participants were compensated with course credit.

For the video rating task, participants (hereafter referred to as observers) watched a series of video clips of eight people (i.e., targets) completing a modified TSST (Kirschbaum et al. 1993). Targets were eight white, non-Hispanic women (M = 24.75, SD = 2.55 years) who participated in another study involving the TSST and consented to having their video recordings used for research purposes. Clips were chosen from three phases of the TSST: a 2 min speech preparation (i.e., preparation phase), a 5 min speech (i.e., speech phase), and a 5 min question-and-answer period (i.e., Q&A phase). Targets were randomly assigned to receive either positive (e.g., smiling, nodding) or negative (e.g., eye rolling, looking uninterested) feedback from two evaluators (one male and one female; see Akinola and Mendes 2008) during the speech phase to increase the variance in targets’ affect, cognition, and physiology. Four targets from each condition were selected for the video task, for a total sample of 32 clips. Observers were blind to the target’s feedback condition (i.e., videos were cropped so observers could only see the target) and were not made aware that a manipulation was taking place.

For each target, observers watched (chronologically) minute two of the preparation phase, minute one of the speech phase, and minutes one and five of the Q&A phase. Minute two of the speech preparation was chosen because it was right before targets self-reported their emotions for the first time and the first minute often included research assistants in the video setting up the room for the stress task; minute one of speech and Q&A were chosen because the first minute of a stress task is typically when targets were most physiologically reactive; minute 5 of Q&A was chosen because that time was right before targets self-reported their post-task emotions and represented the longest exposure to the stressor. The eight targets were presented in a random order. Because the preparation phases involved no audible speaking, the data from this phase were excluded from these analyses. After viewing each 1 min video, observers made a series of ratings about the target’s affect and behavior. Each item was displayed individually on a computer screen, with observers pressing the corresponding number on the keyboard to submit their rating for each item. Questions within each domain (e.g., affect versus behavior) were presented in random order. Videos were presented and ratings were made using E-Prime 2.0, (Psychology Software Tools, Pittsburgh, PA). Specific details about the ratings are described in the measures section.

Measures

Perceptions of Targets’ Affect

Observers used a 7-point scale (1 = not at all, 7 = a great deal) to rate the extent to which targets were feeling five PA items (i.e., strong, confident, enthusiastic, interested, comfortable) and five NA items (distressed, anxious, jittery, agitated, tense).Footnote 2 These items represent both low and high arousal quadrants of the affective circumplex (e.g., Barrett and Russell 1999; Larsen and Diener 1992). Means for PA and NA were computed for each administration. Cronbach’s alphas were for each administration (i.e., speech, Q&A first minute, Q&A last minute) were as follows PA: α1 = .80, α2 = .59, α3 = .87, respectively; NA: α1 = .72, α2 = .60, α3 = .87, respectively). Reliability analyses indicated that “enthusiastic,” “interested,” “distressed,” and “jittery” reduced during the first minute of Q&A. Thus, to improve the internal consistency of the affect scales, we removed these items and computed PA and NA based on the three remaining items for each scale. Resultant Cronbach’s alphas were similar or higher for each administration (i.e., speech, Q&A first minute, Q&A last minute; PA: α1 = .78, α2 = .79, α3 = .86, respectively; NA: α1 = .72, α2 = .82, α3 = .78, respectively).

Observable Indices of Targets’ Behavior

We assessed four observable behavioral indices that communicate affective information. Three of these indices included observers’ ratings of targets’ behaviors using a 7-point scale (1 = not at all, 7 = a great deal) after each video clip: (1) perceived somatic activity was the reverse score of “the target lacked expressive movement.” (2) perceived gesture intensity was an average of two items, how much the target “nodded” and “used their hands to enhance their speech.” (3) perceived speech clarity was a rating of how “clear” the target spoke. For perceived gesture intensity, we calculated Spearman-Brown reliability coefficients (Eisinga et al. 2013) for the three time points when observers made these ratings, ρ1 = .57, ρ2 = .59, ρ3 = .43. The fourth index, (4) vocal prosody, was measured using fundamental frequency (f0) scores. For each 1 min TSST clip, audio was uploaded into Praat (Boersma and Weenink 2017), version 6.0.21, a speech processing software. Audio recordings were coded so speech was only processed when the target was speaking (mean = 45 s, range 35–60 s). Consistent with other social relationship research on fundamental frequency (e.g., Weusthoff et al. 2013), we set the bandpass filter to 75–350 Hz, which captures the normal range of human speech. Mean f0 was calculated for every quarter second and averaged across the length of time when the target was speaking for each of the three sets of TSST clips.

Physiological Data Acquisition and Cleaning

Targets’ electrocardiography (Biopac ECG module) and impedance cardiography (ICG, HIC-2000 Impedance Cardiograph, Bio Impedance Technology) were acquired continuously at 1000 Hz during a 5-min resting baseline period and during the TSST. Signals were integrated into a Biopac MP150 and electrocardiograph and impedance cardiograph signals were scored offline using Mindware software (HRV 3.1, IMP 3.1, Gahanna, OH http://www.mindwaretech.com/). All data were scored in 1 min bins to optimize validity of the estimates for PEP and CO, and due to algorithm constraints for RSA, which requires a minimum of 60 s of data.

RSA estimates were scored in accordance with the recommendations of the Society for Psychophysiological Research (Berntson et al. 1997). Software was specified to score data using a 4 Hz time series to interpolate the interbeat interval (IBI) (Berntson et al. 1993) and the time series was detrended by a second-order polynomial to minimize nonstationary trends. A Hamming window was used to taper the residual series, followed by a Fast Fourier Transformation to derive spectral distribution. RSA was specified as the integral power within typical adult respiration frequency (.12 to .4 Hz). CO (liters per minute) was calculated with the Kubicek formula and provides an estimate of the amount of blood processed by the heart in 1 min. VC was scored as the inverse of the pre-ejection period, which is a time-based measure determined from the time the left ventricle contracts (i.e., arterial polarization, Q point on the ECG waveform) to the aortic valve opening (B point on the ICG waveform). All data scored in Mindware were visually inspected for artifacts, and incorrectly identified QRS complex, and B, Z, and X points were adjusted as needed. For each physiological index (i.e., RSA, CO, and VC), reactivity scores were computed by subtracting baseline scores from the (1) first min of the speech, (2) first minute of Q&A, and (3) last min of Q&A. It should be noted that RSA decreases during the TSST, thus lower numbers for the index of RSA reactivity actually represent greater reactivity (i.e., larger decreases from baseline).

Data Analysis Plan

To address whether target’s physiological reactivity predicts observers’ perceptions of their emotions, we used cross-classified variance components models. These multilevel models are ideal for situations where data is nested, but not perfectly hierarchical (Luo and Kwok 2009). Observers watched and rated the same 24 clips of targets, which could be classified either by which of the eight targets was in the clip or by the minute segment of the TSST they correspond to (i.e., speech minute one, Q&A minute one, Q&A minute five). Thus, at the level of the stimulus, there is a non-hierarchical nesting, which if not accounted for can bias the standard errors of the estimates of the fixed effects for the models (Meyers and Beretvas 2006; Luo and Kwok 2009). These models are implemented by including random effects from the observer, the target in the video, and the video type (i.e., which minute of the TSST is being shown). We used the observers’ perceptions of PA and NA as the criterion variable since these were at the lowest nested level and target physiology as the predictors.Footnote 3 We tested four models, two predicting observers’ ratings of targets’ NA and two predicting observers’ ratings of targets’ PA. For both the NA and PA, we tested a model with RSA reactivity as a predictor and a model with CO and VC reactivity as simultaneous predictors. Given RSA is associated with PNS functioning and has implications for sensitivity to social cues and regulation of cognition, affect, and emotion (Mendes 2016), we examined it in its own model. Because CO and VC are moderately correlated and relevant to challenge and threat they were examined simultaneously. We used restricted maximum likelihood estimation and report fixed effect parameter estimates with model-based standard errors, semi-partial R2 effect size estimates (Edwards et al. 2008), and random effect estimates in Table 2. Data were analyzed using SAS version 9.4. Regarding statistical power, simulation studies suggest 80% power can be achieved with multilevel models that have similar numbers of level-1 observations as our models and as few as 30 level-2 observations (our models had 94; Bell et al. 2008, 2014).

Finally, we examined four theoretically relevant mediators of the association between targets’ physiology and observers’ ratings of targets’ affect: perceived somatic activity, perceived gesture intensity, perceived speech clarity, and f0 in order elucidate whether these observable channels explained the associations between perceptions of affect and target physiology. We tested a series of mediation models, this time using a physiological index as the criterion variable, four mediator variables entered simultaneously, and perceptions of PA or NA as the predictor variable. To do so, we used the PROCESS macro (Hayes 2013) for SPSS version 24, which calculates direct and indirect effects with 10,000 bootstrap confidence intervals for parallel multiple mediation models.

Results

Descriptive statistics are presented for all study variables in Table 1 and correlations between study variables in Table 2. We tested whether on average observers’ perceptions of targets’ PA and NA varied by observers’ gender or as a function of the perceivers’ age to inform covariate selection. Observers’ perceptions of targets’ PA, t(91) = 1.01, p = .32, NA, t(91) = 1.50, p = .14, targets’ somatic activity, t(85) = .04, p = .97, gesture intensity, t(85) = 1.21, p = .23, and speech clarity, t(85) = .78, p = .44, did not vary by gender. Age was not significantly associated with perceptions of targets’ PA (r = − .11, p = .30), NA (r = − .11, p = .30) somatic activity (r = .05, p = .68), gesture intensity (r = .14, p = .22), or speech clarity (r = .04, p = .70). Because perceptions of PA, NA, and target behaviors were not systematically related to age or gender, neither were included in the models as covariates.

Table 1 Descriptive statistics of study variables
Table 2 Bivariate correlations between study variables

Physiology Predicting Perceptions of Affect

Positive Affect

In line with our hypotheses, targets’ RSA reactivity was significantly associated with observers’ perceptions of PA (see Model 1 presented in Table 3), such that greater decreases in RSA were associated with higher levels of perceived PA. Targets’ CO reactivity, as expected, was significantly associated with observers’ perceptions of PA (see Model 2 presented in Table 2). Here, increases in CO were associated with higher levels of perceived PA. Finally, targets’ VC reactivity was positively related to observers’ perceptions of PA (see Model 2 presented in Table 3).

Table 3 Fixed and random effects from multilevel models predicting perceptions of affect from physiology

Negative Affect

Targets’ RSA reactivity was significantly associated with observers’ perceptions of NA (see Model 3 presented in Table 3), but the association was in the opposite direction as what was predicted. RSA reactivity was positively associated with perceived NA, such that greater decreases in RSA were perceived as lower levels of perceived NA. Both targets’ CO reactivity and targets’ VC reactivity were significantly negatively related to levels of observers’ perceived NA (See Model 4 presented in Table 3).

Mediators of Physiology and Emotion Perception

Next, we ran six multiple mediation models. For each model, significant direct effects (see Figs. 1, 2, 3, 4, 5, 6 for the specific path weights from the regression models) remained despite the inclusion of mediators, with the exception of models where VC reactivity was predicted from both PA and NA (see Figs. 5, 6). Here, the direct effect of perceptions of PA and NA predicting VC reactivity were no longer significant upon inclusion of mediators. Vocal prosody (i.e., f0 emerged as a consistent mediator for both PA and NA across all physiological indices (See Table 4 for a summary of indirect effects). No other behavioral indices were significant mediators of the association between observers’ perceptions of affect and targets’ physiology.

Fig. 1
figure 1

Perception of positive affect predicting RSA reactivity mediated through vocal prosody (i.e., f0), intensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Fig. 2
figure 2

Perception of negative affect predicting RSA reactivity mediated through vocal prosody (i.e., f0), intensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Fig. 3
figure 3

Perception of positive affect predicting CO reactivity mediated through vocal prosody (i.e., f0), tntensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Fig. 4
figure 4

Perception of negative affect predicting CO reactivity mediated through vocal prosody (i.e., f0), intensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Fig. 5
figure 5

Perception of positive affect predicting VC reactivity mediated through vocal prosody (i.e., f0), intensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Fig. 6
figure 6

Perception of negative affect predicting VC reactivity mediated through vocal prosody (i.e., f0), intensity of gestures, speech quality, and somatic activity. Statistics presented are b(SE) from regression models. *p < .05 **p < .01

Table 4 Indirect effects from bootstrap mediation analyses

Discussion

Emotion perception is a fundamental building block of many socioemotional processes (Smith 2006; Zaki et al. 2008) and is an integral process for navigating the social world and understanding social cues. The present study examined the relation between targets’ physiological reactivity and people’s perceptions of targets’ emotions. We also examined several nonverbal cues that could help explain the associations between observers’ emotion perceptions and targets’ physiological reactivity. The findings from the present study demonstrate that targets’ physiological responses are associated with observers’ perceptions of how the targets feel. Moreover, we found that vocal prosody may be an especially useful cue through which affective states are communicated to observers.

We found that targets’ ANS reactivity was associated with observers’ perceptions of targets’ PA and NA. More specifically, as predicted, greater vagal withdrawal (i.e., RSA reactivity), was associated with greater perceptions of PA by observers. Unexpectedly, we found RSA reactivity was positively associated with observers’ perceptions of NA. Many psychological states are associated with a decrease in RSA including negative affect, anger, and hostility, but also more benign and positive affective states, like greater motivation and cognitive effort. During active tasks like the TSST, RSA tends to decrease (vagal brake withdraws), so our finding suggests that observers perceived targets with greater RSA decreases as experiencing less NA and more PA possibly due to perceptions of targets increased effort to perform well during the stress task. These findings align with research on vagal flexibility (e.g., Hagan et al. 2017; Muhtadie et al. 2015) that shows greater vagal responses to stress may signal greater social sensitivity and attunement to the environment. In stressful situations, flexible vagal responses have been shown to be related to more sensitive social responses (Muhtadie et al. 2015; Obradović et al. 2010). Our findings extend this literature by suggesting flexible vagal reactivity is associated with others’ perceptions of greater PA and less NA.

In addition to examining the association between observers’ perceptions of affect and targets’ RSA reactivity, we examined several possible mediators (i.e., vocal prosody, somatic activity, gesture intensity, and speech clarity). With regard to RSA reactivity, only vocal prosody mediated the association with observers’ perceptions of NA and PA. Since RSA captures the direct influence of the vagus nerve (Porges 2001), it makes sense that vocal information (also influenced by the vagus nerve) seemed to communicate that information to observers. Generally, the “perceived” behavioral indices may not have been fully adequate to detect nuances in valence or expression, whereas vocal prosody is a more sensitive, objective behavioral index. A review of emotional expressive behavior (Keltner et al. 2016) suggests there are at least 40 behaviors that can signal emotions to others, so looking at many modalities is an important future direction in emotion perception research.

In line with our hypothesis, CO reactivity was associated with observers’ perceptions of targets’ PA. In a stressful task, having greater CO reactivity (i.e., greater increases from baseline level) could indicate that the heart is working efficiently in response to the demands of the situation. This finding fits with the challenge theory’s tenet (Blascovich and Mendes 2010; Seery 2013)—that a challenge state is associated with increased CO, which may be associated with feelings of confidence when faced with stress. To the observers, greater CO reactivity may have been perceived as greater confidence, ease, or competence in the face of stress. CO reactivity was also significantly negatively related to observers’ perceptions of targets’ NA. In contrast to challenge states, a person in a threat state may be have lower CO and experience increased NA. Similar to RSA, we found that vocal prosody alone mediated the association between observers’ perceptions of affect (i.e., both PA and NA) and targets’ CO reactivity. Since CO is partially influenced by the PNS, there may be shared vagal influence between the heart and vocal apparatus, which accounts for this indirect pathway.

There were findings in the literature that suggested VC could be related to increased PA or NA, thus we treated those analyses as exploratory. We found a positive association between VC and perceptions of PA and a negative association between VC and perceptions of NA. VC is related to approach-oriented emotions (Herrald and Tomaka 2002; Mendes et al. 2008), which PA items may capture via words like “strong” and “confident,” whereas NA items “anxious,” “agitated,” and “tense” are tied to more avoidance-motivation. It may be that VC increases translated into perceptions of approach, suggesting VC may not be expressed differentially as affect but rather as affect-inducing motivation. Again, only vocal prosody mediated the associations between VC reactivity and perceptions of affect.

Emotion perception may consciously or automatically give rise to other interpersonal emotion processes. For example, emotional or stress contagion (Waters et al. 2014), where one’s emotions or stress are experienced and shared among people in one’s environment, may be the result of unconscious or automatic emotion perceptions that are influenced across the different emotional channels. Often it may be socially or culturally inappropriate to ask others directly how they feel (e.g., some forms of direct communication may be frowned upon in collectivist cultures; Markus and Kitayama 1991), so people may rely on nonverbal or vocal cues to infer others’ emotional states. Just as individuals differ in nonverbal sensitivity, some people may be especially sensitive to cues from targets’ physiological changes. In fact, using the present sample, Eckland et al. (2018) found observers’ understanding of their own emotions (emotional clarity) was associated with more accurate judgments of targets’ negative emotions (indexed by comparing self and observer reports). The current study furthers our understanding of what information people may be using while making judgments of others’ emotions.

Processes involving emotion perception are also important as they relate to the maintenance of healthy relationships. Research in relationships has found vocal prosody to be an important aspect of communication between couples (Baucom et al. 2011). Our study suggests that vocal prosody may communicate physiological aspects of a person’s affective experiences. This information could signal to partners within couples to engage in emotion regulatory behaviors that could promote healthy bonding or attachment.

Limitations and Future Directions

This study has several important limitations that will be informative in designing future research in this domain. In this study, data from three exposures to eight targets were used. We chose to include multiple exposures from fewer targets to get a range of physiological responses across the TSST. However, other designs (e.g., thin-slicing approaches) where fewer exposures from a greater number of targets will be important to explore in the future, especially to increase representativeness and generalizability of the targets. Additionally, the NA items reflected higher arousal emotions than did the PA items. Consequently, future research in this domain should parse emotions based on more dimensions than valence (e.g., arousal, motivation). More naturalistic designs, using methods such as ambulatory physiology and/or experience sampling, would also further the external validity of the current findings. Furthermore, targets in this study were in a stressful context. It will be important however for future research to extend these findings into everyday contexts where emotion perception regularly occurs.

Using videos offered control over the stimuli but the lack of face-to-face interactions may have reduced the ability of observers to detect subtler, but informative, affective cues like sweating, lip tremors, blushing, sweating, and pupil dilation. In person, these signals may convey a lot of affective/physiological information but may be harder to perceive over video. Due to the subtlety of these behaviors and the quality of the films, it would be very difficult to rely on these measures. These affective cues may be especially important for understanding the role of physiological changes in everyday emotion perception. Another paradigm, such as a social interaction task, might be better suited for measuring more subtle behavioral indicators. Despite this limitation, there are many everyday situations when people need to decipher the emotions of others via video, including interacting with loved ones, colleagues, and even mental health providers. The frequency of video conference calls and instructional webinars in the workplace is increasing as well, as evidenced by a 115% increase in telecommuting in the past decade (Global Workplace Analytics and Flexjobs 2017). Telehealth has been used to make mental health services more available in rural areas (Jameson and Blank 2007) and to U.S. veterans (e.g., telehealth; Gros et al. 2013; Morland et al. 2011). In each of these cases, perception of the other person’s affective information occurs over video, which our findings suggest conveys information across affective channels.

Facial expressions tend to be the most commonly examined behavioral communicator of emotional states (Barrett and Kensinger 2010; Ekman et al. 1980, 2002). However, given the large emphasis on facial expressions in the existing literature, we chose to focus on other channels that may be more directly linked to physiology. Despite this, our results emphasize the importance of voice in communicating the physiological aspects of affective experience. The frequency window we selected to analyze the vocal data maps onto the typical range of human communication (Weusthoff et al. 2013), however, it should also be noted that higher frequencies also carry emotional information. Expanding the range of frequencies may have important insights in future studies of emotion perception. Studies have compared and connected vocal and facial expression of emotion (e.g., Bänziger et al. 2009), but far less in the context of peripheral physiology. Future work could better elucidate how much affective information collected auditorily from the voice or visually from the face maps onto underlying ANS changes.

Conclusion

In the current study, we demonstrate an association between observers’ perceptions about targets’ emotions and targets’ underlying affective physiology. Emotion perception is an important process that provides a foundation for other social and emotional processes. Our study underscores the idea that emotional information is gathered from many sources, so emotion perception research should be looking across emotional channels and sources of emotional information to better understand the process.