Introduction

Basic Emotion Theory (BET) is guided by convergent analogies found in the writings of scientists working in different traditions: Emotions are a “grammar of social living” that situate the self within a social and moral order; they structure interactions, like scripts in pieces of fiction, in relationships that matter (Eibl-Eibesfeldt 1989; Oatley 2004). In more specific terms, within BET emotions are thought of as distinct and brief states involving physiological, subjective, and expressive components that enable humans to respond in ways that are typically adaptive in relation to evolutionarily significant problems, from negotiating status hierarchies to avoiding peril to taking care of vulnerable offspring (Ekman 1992; Ekman and Cordaro 2011; Keltner and Lerner 2010; Shariff and Tracy 2011; van Kleef 2016).

These core assumptions of BET have been foundational to new empirical advances, ranging from the study of a broad number of previously unexplored specific positive emotions (e.g., Campos et al. 2013; Shiota et al. 2017) to progress in understanding basic mechanisms of emotion-related appraisal, language, development, and central and peripheral nervous system physiology (e.g., Lench et al. 2011; Nummenmaa and Saarimäki 2017).

BET has also been central to the study of emotional expression. It was a focus, of course, of Darwin, who was an inspiration of the rich literature that we summarize here. In the simplest of terms, Basic Emotion Theory posits that nonverbal expressions of emotion share five properties. (1) They are brief, coherent patterns of behavior that tend to covary with distinct subjective experiences; (2) they signal the current emotional state, intentions, and/or assessment of the eliciting situation of the individual; (3) they manifest some degree of cross-cultural similarity in both production and recognition; (4) they find evolutionary precursors in the behaviors of other mammals in contexts similar to the contexts humans encounter (e.g., when signaling adversarial or cooperative intentions); and (5) they tend to covary with emotion-related physiological responses (Ekman and Davidson 1994; Hess and Fischer 2013; Keltner and Haidt 2001; Keltner and Lerner 2010; Matsumoto et al. 2008; Shariff and Tracy, 2011).

A first wave of BET-inspired studies on emotional expression find their provenance in the studies of Ekman and Friesen in New Guinea (Ekman et al. 1969), with which many are now familiar, but whose details are worth recalling. Using still photographs of prototypical emotional facial expressions, Ekman and Friesen were able to document some degree of universality in the production and recognition of a limited set of “basic” emotions, including anger, fear, happiness, sadness, disgust, and surprise (for review, see Matsumoto et al. 2008). This study inspired hundreds like it, and led to the replicated finding that observers could reliably identify with some degree of consistency these six emotions in static photos of facial muscle configurations (Elfenbein and Ambady 2002).

Clearly there is much more to emotional expression—both in the behavior people emit and how they judge it—than matching static images of facial muscle movement configurations to words or situations (as in the original Ekman and Friesen work). People clearly express emotions in more ways than in facial muscle movements, and rely on more than just single words or scenarios to make sense of emotional expression. The Ekman and Friesen work inspired several robust critiques. Questions have been raised about biases in the forced choice paradigms, the robustness of those results from forced studies, and the reliance upon such exaggerated, stereotypical expressions (Russell 1994; Matsumoto and Hwang 2017; Nelson and Russell 2013). Reviews have revealed that the relationship between self-reports of subjective experience and facial muscle movements is more modest than perhaps assumed in BET (Duran et al. 2017). More recent data raises questions about the degree to which people from remote cultures actually recognize emotion in static photos of the six emotions (Crivelli et al. 2016). Particularly generative is Fridlund’s critique of BET, summarized in his Behavioral Ecological theory (BECV, see Fridlund 1991, this volume). Fridlund’s theorizing, steeped in evolutionary accounts of nonhuman display, argues that human facial displays did not evolve to signal interior feeling, as presupposed in BET, but social intentions or motives instead (see Parkinson 2005, for a detailed review of BET and BECV claims). This theorizing has inspired studies of how people infer intentions, feelings, and appraisals from expressive behavior, which we consider later.

The Ekman and Friesen empirical work—the focus on how people label static images of facial muscle configurations—has inspired another class of developments in the field that are still guided by the core assumptions of BET but that move beyond the study of the recognition of static images of facial expressions of six emotions. It is these developments that we focus on here. We attend, in particular, to three areas of empirical advance. A first concerns the nature of emotional expression, which has been shown to include much more than six distinct facial expressions, and, in fact, upwards of 20 multimodal expressions. A second set of advances is found in the study of emotion perception, which concerns the processes by which social perceivers derive meaning from emotional expressions of different kinds. Guided by the aforementioned critiques of the Ekman and Friesen studies, the field has moved beyond relying exclusively on forced choice labeling of expressions, and progress is being made in understanding how social perceivers infer intentions, motives, action tendencies, and relational properties of signaler and perceiver in brief expressions of emotion. Finally, arising out of the functional foundation of BET has emerged a new line of inquiry—how expressions coordinate interactions between individuals (e.g., Keltner and Kring 1998; Niedenthal et al. 2010; van Kleef et al. 2016). This line of work most explicitly returns to a core notion of BET—that emotions are the grammar of social living—to detail how brief emotional expressions, in single modalities and in multimodal forms, coordinate interactions within meaningful relationships, such as those between parent and child, romantic partners and friends, or individuals within status hierarchies (e.g., for review, see van Kleef 2016).

Advances in Understanding the Nature of Emotional Expression

Emotional Expressions are Multimodal, Dynamic Patterns of Behavior

Central to Basic Emotion Theory is the assumption that emotions enable the individual to respond adaptively to evolutionarily significant threats and opportunities in the environment, such as the cry of offspring, a threat from an adversary, or a potentially available sexual partner (Ekman 1992; Keltner and Haidt 2003). Emotions enable such responses primarily through shifts in peripheral physiology (Levenson et al. 1990), patterns of cognition (Oveis et al. 2010), movements of the body (e.g., the proverbial fight or flight response), and expressive behaviors that coordinate social interactions through the information they convey and the responses the evoke in others (e.g., Keltner and Kring 1998; van Kleef 2009).

Within this framework, emotions are fundamentally about instigating action and changing the probabilities of future actions (Frijda 1986). Emotions enable people to react to significant stimuli (in the environment or within themselves), with complex patterns of behavior involving multiple modalities—facial muscle movements, vocal cues, bodily movements, gesture, posture, and so on. For example, studies of the emotion sympathy find that this brief state involves bodily movements forward, soothing tactile behavior, oblique eyebrows, a fixed pattern of gaze, vocalizations, and skin-to-skin contact when sympathy leads to embrace (Goetz et al. 2010).

Early studies of emotional expression largely focused on whether perceivers could infer emotions from static portrayals of prototypical configurations of facial muscles thought to convey anger, disgust, fear, sadness, surprise, and happiness (Ekman and Davidson 1994; Russell 1994). The last 20 years of scientific study has moved significantly beyond static facial portrayals of these six emotions, revealing that emotional expressions are multimodal, dynamic patterns of behavior, involving facial action, vocalization, bodily movement, gaze, gesture, head movements, touch, autonomic response, and even scent (Keltner et al. 2016).

Notably, the notion that emotional expressions are multimodal patterns of behavior was evident already in Charles Darwin’s original, rich descriptions of the expressions of over 40 emotional states (Keltner 2009), as illustrated in Table 1 (focusing specifically on positive emotions). As is evident in the table, Darwin focused on extended and multimodal dynamic patterns of behavior, that involve not only facial muscle movements but also changes in gaze, body movements, respiration, gestures, hand movements, the voice, tactile contact, and autonomic responses (e.g., tears).

Table 1 Darwin’s descriptions of the expressive behavior of positive emotions

Early studies, as we have noted, focused almost exclusively on facial muscle movements. The more recent consideration of other modalities of communication has greatly expanded the field’s understanding of emotional expression. Studies of emotional expressions associated with experiences of embarrassment, shame, pride, and love have discerned distinct expressions of these emotions by incorporating measurements of gaze activity (e.g., the gaze aversion of shame and embarrassment), body movements (e.g., the chest expansion of pride and the open posture of love), hand activity (e.g., the face touch of embarrassment and open handed gesture of love), and movements of the head, such as the head tilt back during expressions of pride (Keltner 1995; Tracy and Robins 2004, 2007). These findings have prompted studies to systematically characterize how emotions are communicated in movements of the body (Dael et al. 2012; Gross et al. 2010) and gaze (Sander et al. 2007). These developments in the study of emotional expression are clearly in keeping with Darwin’s more comprehensive analysis, and his suggestion that there should be signal value in how emotions are conveyed from a vast array of communicative behaviors, from simple movements of the hands to shifts in body posture to head movements.

To take one example of a major stream of research in this vein, the human voice has consistently been documented to be a rich modality of emotional expression, as anticipated in the seminal theorizing of Scherer (1986). To study whether people can communicate emotions with the voice, researchers have relied on two methods. In one, people, often trained actors, attempt to express different emotions in prosody, the tone and rhythm of our speech, while reading nonsense syllables or neutral passages of text (Banse and Scherer 1996; Juslin and Laukka 2003). These samples of emotion-related prosody are then presented to listeners, who select from a series of options the term that best matches the emotion conveyed in the speech output. For example, Petri Laukka, Hillary Elfenbein and their colleagues had actors from five countries—India, USA, Singapore, Australia, and Kenya—attempt to convey 11 different emotions—anger, contempt, fear, happiness, interest, neutral, sexual lust, pride, relief, sadness, and shame—while uttering sentences of neutral content (e.g., “Let me tell you something”). They then presented these clips of emotional prosody to people in different cultures, and found that listeners could recognize most of the intended states when asked to label the sounds’ emotional content (e.g., Laukka et al. 2016). These findings build upon a review of 60 earlier studies of this kind, which found that listeners can judge five different emotions in the prosody that accompanies speech—anger, fear, happiness, sadness, and tenderness—with accuracy rates that approach 70% (Juslin and Laukka 2003). Judgments are most accurate when listeners hear members of their own culture (Pell et al. 2009).

In a second line of study of vocal expression, participants communicate emotions through vocal bursts, which are brief, non-word utterances that arise between speech incidents. Laughs, shrieks, growls, sighs, oohs, and ahhs, are examples of vocal bursts. In studies of vocal bursts, people are typically given a situation that produces an emotion (e.g., for awe, “you are seeing a large waterfall for the first time”) and asked to communicate that emotion with a brief vocal burst but no words (Laukka et al. 2013; Sauter and Scott 2007; Sauter et al. 2010a, b; Simon-Thomas et al. 2009). These sounds are then played to listeners, who attempt to label the sound with one of many emotion terms, or to match the sound to the appropriate emotion eliciting situation. As with emotional prosody, people are quite adept at communicating emotions with vocal bursts. For example, Cordaro and colleagues presented vocal bursts of 16 emotions to people in 10 different cultures in Western Europe (Germany, Poland), East Asia (China, Japan, South Korea) and South East Asia (India, Pakistan) (Cordaro et al. 2016). In this study participants were asked to match emotionally rich but simple situations (e.g., someone has insulted you; you hit your leg on a rock) to one of four vocal bursts. Overall, participants were correct in matching stories to vocal bursts of 16 emotions 79% of the time. People in these 10 countries were able to identify vocal bursts of six positive emotions—amusement, awe, contentment, desire, interest, relief, and triumph—and six negative emotions—anger, contempt, disgust, embarrassment, fear, pain, and sadness. Subsequent studies have documented that even in remote cultural groups in Bhutan and Namibia, people are able to reliably discern a number of emotions from vocal bursts (Cordaro et al. 2016; Sauter et al. 2010a; Sauter et al. 2015).

Yet another modality that has been of increasingly systematic focus in the study of emotional expression is touch. In one line of research inspired by BET, Hertenstein et al. (2006, 2009) brought an encoder (the person charged with expressing emotion via touch) and decoder (the person being touched) to the lab. The encoder and decoder sat at a table, separated by an opaque black curtain which prevented communication other than touch. The encoder was given a list of emotions and asked to make contact with the decoder on the arm to communicate each emotion, using any form of touch. The decoder could not see any part of the touch because his or her arm was positioned on the encoder’s side of the curtain. After each touch, the decoder selected from 13 response options the term that best described what the encoder was communicating. Participants were found to reliably communicate anger, disgust, and fear from a brief one- or two-second touch of another’s forearm, as well as love, gratitude, and sympathy (see also Piff et al. 2012, for replication). Emotions like embarrassment, awe, and sadness were not reliably communicated via touch. In other research, it was found that people are more reliable in communicating emotion through touch when allowed to touch other regions of the body than the arm (Hertenstein et al. 2009). Finally, there are cross-cultural similarities in which emotions can be conveyed through tactile contact (see Hertenstein et al. 2009).

There are also emerging literatures on potential autonomic signals of emotion, including the blush (van Dijk et al. 2009), the chills (Maruskin et al. 2012), and tears (Balsters et al. 2013; Vingerhoets and Bylsma 2016). Thinking of emotional expressions as dynamic multimodal patterns of behavior points to intriguing new questions (e.g., Aviezer et al. 2012). What is the relative contribution of different modalities to the perception and signal value of emotional expressions (e.g., Flack 2006; Scherer and Ellgring 2007)? Why is it that certain emotions are more reliably signaled in multiple modalities, whereas other emotions are only recognized from one modality? For example, sympathy is reliably signaled in touch and the voice, but less so in the face (Goetz et al. 2010). It is nearly impossible to communicate embarrassment through touch, but it is reliably communicated in patterns of gaze, head, and facial behavior (App et al. 2011).

There are Expressions of More Emotions than the “Basic” Six

Critical to Basic Emotions Theory is the question of which emotions have distinctive signals. Evidence germane to this question informs taxonomies of emotion (e.g., Keltner and Lerner 2010). As evident in the previous section, evidence has emerged revealing that emotions beyond the “basic six” have distinct multimodal and dynamic expressions, including emotions such as embarrassment, pride, shame, and love. In recent years, dozens of studies have contributed to this line of work differentiating expressions of a wider range of emotions (e.g., Keltner et al. 2016; Laukka et al. 2013; Sauter and Scott 2007; Tracy and Robins 2004). Three methods have been at the heart of this new development. A first is emotion encoding studies, where behavioral analyses ascertain whether the experience of closely related emotions, such as sympathy or distress, or love or desire, or embarrassment, shame, and amusement, are expressed in different patterns of behavior (e.g., for review, see Matsumoto et al. 2008).

A second approach is found in emotion production studies. In these studies, participants are given a prompt, most typically the definition of an emotion or an emotion-specific story, and asked to communicate each emotion nonverbally. For example, in one recent study, participants in five different cultures—China, India, Japan, Korea, and the USA—heard twenty-two emotion-specific situations in their native language and were asked to express the emotion in whatever fashion they desired, which could include facial, vocal, or bodily expressions; the only requirement was that the expressions be nonverbal (Cordaro et al. 2018). Over 5500 facial expressions, bodily movements, gaze movements, hand gestures, and patterns of breathing were coded using an expanded Facial Action Coding System (Ekman and Friesen 1978), and a large subset of these was analyzed for patterns across and within cultures. For the 22 emotions that were studied, certain configurations of expressive behaviors were observed with above chance frequency across all five cultural groups, which one might think of as the prototypical elements of the multimodal expression. Across cultures the expression of awe, for example, tended to involve the widening of the eyes and a smile as well as a head movement up. Across cultures, head nods expressed interest. Confusion was generally expressed with behaviors including furrowed brows, narrowed eyes, and a head tilt. Overall, 22 emotions were found to have distinct, multimodal expressions.

A third approach to documenting distinct expressions is with emotion recognition paradigms, in which participants attempt to map an emotion concept—in their own words, in stories, or emotion terms—to different emotion-expressions. Based on the advances in understanding facial expression, Cordaro took photographs of prototypical facial-bodily expressions of 18 different emotions and then gathered data from 10 different cultures, ranging from Pakistan to New Zealand (Keltner and Cordaro 2016). In this study, as in the Ekman and Friesen work, participants were presented with emotion specific scenarios for each of 18 different emotions (e.g., for pain: “this person just stubbed their toe on a rock”). For each scenario they were required to choose from one of four static photos of facial/bodily expressions the photo that best captured the scenario. Table 2 presents examples and descriptions of the photos in this study. As can be seen from the recognition rates presented in Fig. 1, the landscape of emotional expression in the face and body is increasingly rich.

Table 2 Facial expression examples, FACS action units, and physical descriptions for each expression
Fig. 1
figure 1

Recognition rates across five cultures in identifying 18 emotions from facial/bodily expressions portrayed in static photos (from Keltner and Cordaro 2016)

In Table 3, we summarize this new literature on multi modal expressions beyond the basic 6, indicating whether studies reveal that the facial, bodily, vocal, tactile, and music-related expressions of each emotion can be differentiated from expressions of other emotions. In the respective columns, “yes” indicates that the evidence suggests that the emotion is communicated in a modality at above chance levels; “no” indicates that the emotion cannot be reliably communicated in the modality. These data make the case for distinct expression or 24 emotional states when different modalities are considered. We note, however, that these findings leave open the possibility that there will be emotions with distinct multimodal expressions that are not readily recognized, and that few if any studies have looked at how reliably these emotions are identified when all modalities are considered.

Table 3 Evidence related to the expression of emotion in different modalities

Within Category Prototypes, Variations, and Cultural Dialects

An early assertion of BET is that emotions are expressed not only in prototypical expressions involving the behaviors common to that category, but also via within-category variations of expressions (Ekman 1992). For example, Ekman observed that alongside the prototype of an anger expression—furrowed brow, raised upper eyelids, lip tighten and press together—there are upwards of 60 variants of anger-elated expressions (Ekman 1993). More generally, within an emotion category variations might include additional behaviors—an eye brow flash in the embarrassment expression—or fewer of the prototypical elements of an expression—an expression of anger that only involves the lip press and tightening, but no movement in the eye brow region.

Empirical studies have been fruitfully guided by this analysis of within category variation in emotional expression. For example, early studies of the expressive behavior of embarrassment documented a multimodal prototypical expression that included gaze down, head movement down, awkward smile. Further, naïve observers were better able to recognize expressions of embarrassment as they increasingly resembled the prototypical expression (Keltner 1995). A similar analysis has been taken to the analysis of pride, uncovering a prototypical expression and variations (see Tracy and Robins 2007). Likewise, studies find that within the category of laughter, there are multiple variations (Szameitat et al. 2009). Studies of emotion-related tactile contact similarly find variation in the patterns of tactile behavior (location, pressure, configuration of hand) within the expression of one emotion, such as gratitude or sympathy, and, as in studies of facial and bodily movement, observer accuracy varies depending on which particular expression is observed (Hertenstein et al. 2006).

There is clear precedent in BET that there is not necessarily a one-to-one correspondence between the occurrence of an emotion and a prototypical expression (see Ekman 1992). Rather, emotions are expressed in prototypical multimodal patterns of behavior, with striking variations. To make sense of emotion-related prototypes and within category variations, Hillary Elfenbein, Ursula Hess, and their colleagues have offered their dialect theory of emotional expression (Elfenbein 2013; Elfenbein et al. 2007). This theorizing posits that emotional expression is likely to function much like language, such as English, in the sense that languages have elements—select phonemes, words, forms of syntax—shared by all speakers of the language, as well as dialects, or specific variations of the language in sound and word use that are specific to a geographical region. For example, although standard English is common to the English speaker in England, different regions—London, Newcastle, or the Midlands—are known to speak their own dialects, with unique words, phrases, and accents and forms of prosody.

Several recent studies speak to the prevalence of dialects in emotional expressions (e.g., Cordaro et al. 2018; Elfenbein et al. 2007; Laukka et al. 2016). In these studies, people from different cultures were given a definition of different emotions or a situation likely to produce the emotion, and then asked to express the emotion with any behavior that feels natural. These patterns of expression were then carefully analyzed for their specific facial, bodily, or vocal behaviors, identifying what is universal and how prevalent culturally specific dialects are. A first generalization of the results is just how pervasive emotion dialects are. For example, in one study that looked at expressions of 22 emotions, every emotion was found to have a dialect specific to the culture, and about 25% of an individual’s expressive behavior across emotions was based on dialect, while around 50% of an individual’s expressive behavior adhered to the universal prototype (Cordaro et al. 2018). Second, dialects appear to be more likely to emerge for emotions that are more directly involved in social interactions, such as anger, happiness, or shame—compared to emotions that are less directly or frequently involved in social interactions, such as disgust or fear (Elfenbein et al. 2007).

Process, Structure, and Contextual Shaping of Emotion Perception

The first wave of science on emotional expression—largely focused on the face—involved emotion recognition studies that most typically entailed participants matching an emotion term, or an emotional-specific story, from a list of options to a specific expression. A meta-analysis of 182 independent samples examining judgments of emotion from facial and other nonverbal cues yielded an average accuracy rate of 58.0% (a large effect size), after correcting for chance guessing (Elfenbein and Ambady 2002). With respect to vocal expressions of emotion, in a review of over 100 studies largely using single word emotion recognition paradigms, Juslin and Laukka (2003) concluded that listeners can judge at least five different emotions in the voice—anger, fear, happiness, sadness, and tenderness—with accuracy rates that approach 70% (see Hawk et al. 2009; Sauter et al. 2013).

These findings have been critiqued, and debate continues about the degree of agreement in the recognition of emotion from expressive behavior (Nelson and Russell 2013; Russell 1994). Fridlund has also raised theoretical questions about what exactly is signaled by expressive behavior (Fridlund, this volume; Parkinson 2005). Feelings? Intentions? Likely actions? Properties—e.g., of dominance or affiliation—of the relationship between the communicator and perceiver? This critique and theorizing has inspired considerable advances in the conceptualization of the process, structure, and contextual shaping of emotion perception.

The Process of Emotion Perception

Clearly, when an individual encounters emotional expression in others, he or she is likely to engage in complex inferential processes that involve more than the ascription of single word labels; inferences are made about the target’s desires and intentions, trait-like tendencies, strategic motivations, and surrounding context (Sander et al. 2007).

One approach to this issue is that of Scherer and colleagues, who propose that perceivers first infer specific appraisals in the expresser that prompted the expressive behavior in the first place (Scherer and Grandjean 2008). That is, if a person sees another person express anger in the face, or interest in the voice, or sympathy in a pattern of postural movement and tactile contact, the social perceiver first infers a pattern of appraisals that would lead the individual to express that particular emotion. From these inferred appraisals, the social perceiver, this line of theory maintains, then infers the experience of specific emotions. To illustrate, seeing someone express surprise in the face and voice might lead the observer to infer that the individual has been exposed to novel, unexpected information, which in turn would lead the observer to infer that the person is surprised. According to this account, the first inferences perceivers draw upon when seeing others’ expressive behavior is a pattern of appraisals, rather than distinct emotions.

In a similar spirit, Scarantino has synthesized studies of emotion perception in a theory of affective pragmatics (Scarantino 2017; Fischer and Sauter 2017). He makes the case that emotional expressions—in the present case facial/bodily expressions—communicate four kinds of information: (1) the individual’s current feeling (the expressive function of expression); (2) what is happening in the present context (the declarative function of expression); (3) desired courses of action from other people who perceive the expression (the imperative function of expression); and (4) intention and plans about what the person might do (the commissive function of expression).

As one empirical illustration of this thinking about the inferences that expressive behavior prompt, Shuman and colleagues presented observers with dynamic, videotaped portrayals of five different emotions: happiness, sadness, fear, anger, and disgust (Shuman et al. 2015). The expressions were dynamic, more realistic, and less exaggerated than those in the Ekman and Friesen photos, more like the expressions people see in everyday social interactions. In different response formats, participants matched each expression to either: feelings (“fear”), appraisals (“that is dangerous”), social relational meanings (“you scare me”), or action tendencies (“I might run”). Results showed that participants labeled the dynamic but subtle expressions with the expected response 62% of the time, with greater accuracy revealed when labeling expressions with feeling states, and reduced accuracy found in labeling action tendencies (see Hortsmann 2003). More recent work in the Trobriand Islands found that action tendencies were more prominent in the interpretation of facial expressions than emotion words, suggesting possible cultural variations in the labeling of emotional expression (Crivelli et al. 2016).

This emerging literature on the process of emotion recognition from expressive behavior, would be well served by taking on intriguing questions. Given what has been learned about the automaticity of inferring trust, warmth, and dominance from human faces (e.g., Oosterhof and Todorov 2008), more fine-grained methods oriented toward unpacking the process of emotion recognition could yield insights into the primacy of what information is conveyed—feelings, appraisals, intentions—and the unfolding process of emotion perception.

The Structure of Emotion Perception

Emotion perception involves more than labeling multimodal expressions with words that capture distinct emotions. The way that distinct emotional expressions relate to each other—that is, the structure of recognition—is also critically important. For instance, suppose a given study found that expressions of “love,” “joy,” and “embarrassment” were accurately identified in two cultures. If, in the same study, members of one culture thought the expression of “love” was similar in meaning to that of “joy” but not that of “embarrassment,” whereas in another culture individuals thought the expression of “love” was more similar in meaning to “embarrassment”, this would reveal a potentially important cultural difference in the emotional meaning attributed to the three expressions. The structure of the relatedness of meaning of the three expressions may, in fact, be as interesting as the fact that they can be differentiated in forced choice type labeling paradigms.

Consider this intriguing study on the structure of emotion perception by Jack et al. (2012). These researchers relied on computer morphing technologies to generate over 2500 facial expressions based on the combined movements of anywhere from 1 to 6 Facial Action Units. They then presented these 2500 facial expressions as animations in sequences of four separate photos unfolding over 1.25 s, and had participants rate the emotional meaning of these expressions. With traditional factor analytic approaches, they documented that between 25 and 35 distinct states could be discerned in facial muscle movements, but that this realm of expression could be reduced to a simpler structure of four distinct patterns of Action Units well preserved in meaning across cultures. In both cultures, they distinguished (1) positive emotions, (2) sadness/fear-related emotions, (3) surprise-related emotions, and (4) disgust/anger-related emotions, respectively (for similar results on the variety of expressions see Cordaro et al. 2018; Du et al. 2014; for critique of study, see Sauter and Eisner 2013).

Another recent study took a different approach to examining cultural similarities and differences in the structure of emotion perception. Bai et al. (2018) began with more caricatured representations of 51 emotion concepts, created by asking a professional illustrator to draw each concept using emoji-like images. In one experiment, participants self-sorted the drawings, which were printed onto cards, into multiple stacks of drawings with similarity meaning. These data were processed with an agglomerative hierarchical clustering algorithm, resulting in tree-like representations of the structure of emotion perception, which correlated over .90 across the two cultures, and which we portray in Fig. 2.

Fig. 2
figure 2

A hierarchical taxonomy of perception of 51 emoji-like drawings based on data from a card sorting study, averaging across US and Chinese participants. Note Leaves (outer edge of the hierarchy) correspond to individual drawings. These drawings are paired based on correlations in the proportion of matches with each other drawing. These pairings are then linked iteratively based on the average correlations between the judgment profiles of the two sets of drawings they contain, so as to maximize the correlations in judgments of the drawings that are grouped together. The correlation corresponding to each linkage is represented by its distance from the center, with the grey outer edge corresponding to a correlation of 1 and the inner circle corresponding to a correlation of 0. The branches of the tree are colored according to 5 top-level clusters linked at a correlation > 0. All linkages shown correspond to correlations that are significantly higher than would be expected if the linked branches were correlated only by chance, based on the number of leaves contained in each branch (p < .05) (Color figure online)

As with the research by Rachel Jack and her colleagues, these approaches have the promise of capturing how expressions relate to one another, and broader categories of affective states that include distinct emotions.

The Contextual Shaping of Emotion Perception

A final growth area in the study of emotion perception is a focus on how emotion perception is shaped by features of the social context (Barrett et al. 2011; Hess and Hareli 2017; Scherer 1986). An important source of contextual shaping of emotion perception is, of course, culture. Cultures vary greatly in their prioritization and understanding of emotion concepts, knowledge, and representations, so culture will necessarily influence the perception of emotion in expression. For example, cultures vary in their attention to the surrounding context of an expression. In one paradigmatic experiment, Masuda and colleagues showed Japanese and American participants cartoon figures with various facial expressions (Masuda et al. 2004). The central, target face was always surrounded by smaller, less salient faces, that displayed expressions that were dissimilar to those of the target. Japanese participants’ judgments about the central target’s facial expression were more influenced by the surrounding faces than were judgments made by Americans –who tended to restrict their focus to the expression shown by the central target. Differences between groups of perceivers have also been found for populations that differ on other dimensions than culture, such as social class. In this vein, Kraus and colleagues have found that lower class individuals—more oriented to the social context than upper class individuals—also incorporate contextual information into their judgments of expressive behavior to a greater extent (Kraus et al., 2010).

A second source of variation is situational—who is the person expressing emotion, and what context are they in? How might the gender, power, ethnicity or social class of the individual expressing emotion shape what emotion observers perceive? For example, people are more likely to detect anger in men’s expressions of emotion, but sadness in expressions of women (Hess and Hareli 2017). US participants are more likely to perceive anger in the emotional expressions of African Americans (Hugenberg and Bodenhausen 2003). Also relevant are what other behaviors expressers might be engaging in. For example, Aviezer et al. (2008) presented a photograph of a prototypical facial expression of disgust in one of four stimulus contexts, with the person expressing disgust engaged in different actions. Participants labeled the expression as disgust 91% of the time when the individual was holding a soiled article of clothing, 59% of the time when the person displayed fearful hand and arm movements, 33% of the time when the same person was clasping his or her hands sadly to the chest, and 11% of the time when the person was poised with fists clenched to punch. These results suggest that people do not see and perceive faces in a vacuum; rather, they are one very important predictor of emotion perceptions, which are used in combination with other contextual information to form judgments. Recent findings also highlight that a person’s previous emotional expressions can influence how a current emotional expression is perceived (Fang et al. 2018). Clearly, the many dimensions of context—the nature of the expresser, the surrounding people, the formality or informality of the setting—all influence emotion perception.

A third kind of context is perceptual context (Barrett et al. 2011). Perceptual context refers to the mental states within the perceiver’s mind that shape his or her inferences upon observing expressive behavior. A person’s current feelings, goals, intentions, values, and physical state give rise to context-specific interpretations of social expressive behavior. For example, recent studies find that the likelihood that participants will label a disgust expression as “disgust” rises when an anger expression precedes the presentation of the disgust expression, but drops when no anger expression precedes the target disgust expression (Pochedly et al. 2012).

Emotional Expression and the Coordination of Social Interaction

Based on his years of intensive observation of pre-industrial peoples, Eibl-Eibesfeldt posited that emotional expressions are the “grammar of social interaction” (1989). Facial expressions, vocalizations, patterns of bodily movement, gaze, gestures, and touch bind people into dyadic and group-based interactions—the soothing of a distressed child, flirtation between potential suitors, sexual interaction, the play of young siblings, the aggressive encounters of rivals, or status conflicts in groups.

A corollary to this analysis is that emotional expressions trigger systematic inferences and behavioral responses in others. This thinking requires that we shift a level of analysis, and look from individuals’ expressions of emotion to the dyadic and group level (Keltner and Haidt 1999), as has been done in the study of emotional mimicry (Hess and Fischer 2013). In other words, perceivers do not merely recognize emotions from nonverbal displays—they also respond to them with their own emotion-guided behaviors, ranging from mimicry, coordination, and tenderness to antagonism and avoidance.

Consider the recent theorizing of Paula Niedenthal and her colleagues concerning how different smiles and laughs evoke different inferences and responses in others (Niedenthal et al. 2010). Within 500 ms, this theorizing posits, people respond to smiles with mimetic behavior and physiological reactions. For example, a warm smile of enjoyment triggers neural processes that lead the perceiver to seek more information about the smiler through eye contact, which in turn evokes feelings of pleasure, mimetic behavior, and the experience of positive emotion and approach behavior. A proud, dominant smile, by contrast, triggers the same automatic search for information about the smiler, along with neural activation that leads to a sense of threat and avoidant behavior.

So how do emotional expressions coordinate social interactions? Three ideas have emerged (Keltner and Kring 1998; van Kleef 2009). A first is that emotional expressions rapidly provide important information relevant to perceivers, useful in guiding subsequent behavior. For example, emotional expressions can signal trait-like tendencies of individuals. Individuals looking angry are perceived as dominant (Knutson 1996) and those showing embarrassment are seen as being of upstanding character (Feinberg et al. 2012). Pride displays promote automatic, cross-cultural judgments of high status in the displayer—judgments that are strong enough to counter contextual information indicating that the displayer in fact merits low status (Shariff and Tracy 2009; Shariff et al. 2012; Tracy et al. 2013).

Emotional expressions also signal the trustworthiness of the sender (Fang et al. in press). In one study, Krumhuber and colleagues found that people trust interaction partners more, and will give more resources to those partners, if the partners display authentic smiles (which have longer onset and offset times) compared to fake smiles, which have shorter onset and offset (Krumhuber et al. 2013). Social perceivers also infer trustworthy intentions from people who spontaneously display intense embarrassment, and are more likely to cooperate with individuals who express embarrassment than other emotions (Feinberg et al. 2012). Pride displays direct social learning by providing information to others; individuals motivated to attain the correct answer to a difficult trivia question were found to selectively copy the answer provided by others showing pride, more so than others showing happiness or a neutral display, suggesting that pride displays communicate expertise or knowledge (Martens and Tracy 2013).

Emotional expressions also convey essential information about the environment (e.g., Klinnert et al. 1986). For example, parents use touch and voice to signal to their young children as to whether other people and objects in the environment are safe or dangerous (Hertenstein 2002), using vocal cues that are consistent across cultures (Bryant and Barrett 2007).

Emotional displays coordinate social interactions in a second way, by evoking specific responses in social perceivers. Early studies in this tradition found that some emotional expressions trigger complementary emotions in social perceivers: facial displays of anger enhance fear conditioning in observers, even when the anger displays are not consciously perceived (Ohman and Dimberg 1978); expressions of distress can evoke sympathy in observers (e.g., Eisenberg et al. 1989); displays of dominance trigger more submissive expressive behavior (Tiedens and Fragale 2003). More recently, van Dijk et al. (2009) have documented that the blush is an involuntary, costly way in which people signal their awareness and regret for the mistake they have made: social observers responded with more positive emotion to individuals who blushed after they made mistakes than if they showed other display behavior.

Finally, emotional expressions structure social interactions by serving as incentives for others’ actions, by rewarding specific patterns of behavior in perceivers. Early studies on this notion focused on how parents use warm smiles and touches to increase the likelihood of certain behaviors in their children (e.g., Tronick 1989) and the incentive value of laughter, and how it triggers cooperative interactions between friends (Owren and Bachorowski 2001).

This analysis of the rewarding properties of emotional expression likewise sheds light on some of the direct effects of emotional touch upon recipients of touch (for review, see Keltner 2009). Gentle, pleasing touch triggers activation in the orbitofrontal cortex, a brain region involved in the representation of secondary rewards. Given the rewarding quality of being touched, it has been claimed that touch motivates sharing behavior in others (De Waal 1996). This may help explain why warm touch increases compliance to requests (Willis and Hamm 1980) and cooperation toward strangers in economic games.

Clearly, the study of how expressions coordinate social interactions are in their infancy. Many of the studies of the informative, evocative, and incentive functions of expressions have largely focused on the face; it will be important to extend this line of reasoning to studies of the voice, touch, gaze, and other modalities. With a few exceptions, this work has focused on a fairly limited set of emotional displays—smiles, anger displays, disgust expressions, and fear expressions. It will be important to examine how less studied expressions of emotion, for example of interest (in the voice), gratitude (in touch), sympathy (in the voice or touch), or awe (in the voice), coordinate social interactions.

Studies of the social functions of emotional expressions have set the stage for new theorizing. One recent line of argument has outlined how emotional expressions evolved to serve these informative, evocative, and incentive signaling functions, perhaps in the “second stage” of their evolution (see Shariff and Tracy 2011). This account dates back to Darwin (1872), and argues that internal physiological regulation was likely the original adaptive function of emotion expressions, which later evolved to serve communicative functions (e.g., Eibl-Eibsfeldt 1989; Ekman 1992; Shariff and Tracy 2011).

To take the classic example of fear, the facial muscle movements that constitute a fear expression likely originally emerged as part of a functional response to threatening stimuli; widened eyes increase the scope of one’s visual field and the speed of eye movements, allowing expressers to better identify (potentially threatening) objects in their periphery (Susskind et al. 2008). In contrast, the ‘scrunched’ nose and mouth of the disgust expression results in constriction of these orifices, thereby reducing air intake (Chapman et al. 2009). Given that disgust functions to alert expressers of the potentially noxious nature of the eliciting stimulus, and thereby disincline them from ingesting it (Rozin et al. 2004), the reduced inhalation of airborne chemicals can well be considered part of the same adaptive response. In more recent work, these authors have shown that the opposing eye movements involved in fear and disgust expressions (i.e., widening versus narrowing) function to increase visual sensitivity (localizing an object) and acuity (determining what the object is), respectively—further supporting the argument that these two expressions initially evolved to serve opposing yet equally important functions for the expresser (Lee et al. 2013).

However, many of these original physiological benefits experienced by expressers eventually became transformed into communicative signals, which benefit both expressers and observers by virtue of allowing for more efficient communication and coordinated interactions. Over time, the facial and bodily behavioral components of certain emotions came to signal those emotional states to observers, through processes of ritualization, wherein mammalian nonverbal displays become exaggerated, more visible, distinctive and/or prototypic, and ultimately, more recognizable (Eibl-Eibesfeldt 1989; Shariff and Tracy 2011).

Looking Forward to Future Advances in the Study of Emotional Expression

In this review, we have summarized recent advances in the study of emotional expression inspired by Basic Emotion Theory. This literature reveals that there are upwards of 20 emotions with distinct, multimodal expressions. Intriguing discoveries highlight how this increasingly rich array of states with multimodal expressions might have a deeper structure that speaks to the potential evolutionary origins of emotional expression. And work is revealing how emotional expressions coordinate social interactions; they are indeed a grammar of social living.

These advances in the study of emotional expression are already proving to be generative in advancing other core hypotheses of Basic Emotions Theory. As one example, within this theoretical framework it is assumed that emotions involve emotion-specific physiology, which enable specific behaviors in response to eliciting stimuli—flight, skin-to-skin contact, the widening of the eyes to take in more information, clasping and striking. The literature we have reviewed here has begun to illuminate how distinct emotions covary with distinct physiological response. For example, brief nonverbal displays of love (Duchenne smile, head tilt, open handed gestures) correlate with oxytocin release, whereas cues of sexual desire (lip licks, lip puckers) do not (Gonzaga et al. 2006). Sympathy-related oblique eyebrow movements relate to increased activation in the vagus nerve, a branch of the parasympathetic autonomic nervous system that supports care-giving in mammals (Eisenberg et al. 1989; Stellar et al. 2015). Recent work finds that fear-related vocalizations, but not those of other emotions, covary with cortisol release (Anderson et al. 2018). This work suggests that more precise measurement of emotional expression may yield new insights into emotion-related physiology.

Critical to Basic Emotions Theory is the notion that human emotional expression arose during the process of mammalian evolution, and, by implication, that there should be compelling homologies between human and non-human behavior. Careful cross-species comparisons between human and nonhuman expressive behavior have revealed functional origins of laughter, smiling, embarrassment, affiliative cues involved in love, sexual signaling, threat displays, and dominance (for review see Keltner et al. 2016). Careful analyses of nonhuman vocal displays find distinct displays for sex, food, affiliation, care-giving, and threat (e.g., Briefer 2012; Morton 1977; Snowdon 2003).

In moving beyond the basic six, new studies of emotional expression guided by Basic Emotion Theory are generating important advances in understanding what emotions are, and how they shape human social life.