Introduction

Deaf persons using sign language have been extensively studied to explore if language is an innate ability, if there are universals that can be found in all languages and if the underlying network of sign language is similar to spoken languages (for an introduction see Emmorey, 2001). Because signers use three-dimensional space as a means to transport semantic content, grammar and phonology, differences in signers compared to non-signers in spatial processing have been described in both linguistic and non-linguistic contexts (Emmorey & Tversky, 2002). As an example for a non-linguistic task, signers outperform speakers of spoken languages in mental spatial transformations (Emmorey, Kosslyn, & Bellugi, 1993). The presence of language effects in a non-linguistic domain was even taken as support for linguistic relativity (Whorf, 2012), in the sense that interpretations of signed or spoken sentences depend on language-related factors and constraints (Dobel, Enriquez-Geppert, Hummert, Zwitserlood, & Bolte, 2011).

While the strong relation between spatial processing and language is well documented (e.g. Levinson, 2003), it may be surprising that signers display heightened expertise in a domain that appears unrelated to language, e.g. non-linguistic aspects of face processing. A series of studies demonstrated that signers perform better than speakers of spoken languages in face recognition tasks (Bellugi et al., 1990; Bettger, 1992; Bettger, Emmorey, McCullough, & Bellugi, 1997). This is even the case in deaf signing children aged 6–9 years (Bellugi et al., 1990; Bettger et al., 1997) when performing a matching task with unfamiliar faces (Benton, 1983). In contrast, deaf children not using sign language do not show an advantage in face recognition (Parasnis, Samar, Bettger, & Sathe, 1996) and thus, it appears that it is not deafness per se that produces the effect, but the expertise with and use of sign language, at least in children. The use of sign language per se, independent of hearing status, might change subordinate levels of face processing as evidenced by speed–accuracy trade-off for face recognition in deaf and hearing signers (Stoll et al., 2017). Exploring face recognition in more detail, McCullough and Emmorey (1997) suggest that the underlying mechanism is enhanced processing of local facial features that are used in sign language, i.e. the eyes and, most importantly, the mouth region which is used for speechreading. By contrast, more global, “Gestalt” like processing of faces is not altered in signers (McCullough & Emmorey, 1997), nor is the amount of holistic/configural face processing (de Heering, Aljuhanay, Rossion, & Pascalis, 2012).

A common misconception about using signs for communication is that signers rely exclusively on manual signs. The face is a particularly communicative part of the human body, conveying information not only about a person´s emotion or identity, but also dynamic cues to speech content aiding speech comprehension (Campbell, Brooks, Haan, & Roberts, 1996). While most aspects of face perception are characterized as holistic rather than feature-based (Maurer, Le Grand, & Mondloch, 2002), speechreading may be a possible exception: The mouth region is crucial, and there is little or no direct contribution from the upper half of the face (Marassa & Lansing, 1995), but more subtle effects of holistic processing on speechreading may exist (Schweinberger & Soukup, 1998). In most sign languages, the eye region is used, e.g. to signal turn-taking as in American Sign Language (ASL). Interestingly, beginning signers fixate more the mouth region of signers, possibly to perceive information from speechreading, while native signers fixate on or near the eyes (Emmorey, Thompson, & Colvin, 2008). Even though everybody reads lips (Rosenblum, 2008), persons with severe hearing loss outperform hearing persons in speechreading even if they do not speak official sign languages (Bernstein, Tucker, & Demorest, 2000). The success of speechreading depends on various cognitive factors (Andersson, Lyxell, Ronnberg, & Spens, 2001) such as working memory functions. Deaf persons using signs as major means of communication, i.e. as their native language or sign-supported speech (SSS), look more on the mouth region than hearing participants. Hearing persons, however, equally inspect upper and lower areas of faces. It thus appears that persons using signs for communication employ mostly peripheral vision to perceive signs (Mastrantuono, Saldana, & Rodriguez-Ortiz, 2017). A similar conclusion was reached by He, Xu and Tanaka (2016) who reported that deaf signers compared to hearing non-signers show a smaller inversion effect in the mouth region. The authors concluded that deaf signing participants had enhanced peripheral field attention. Using event-related potentials Mitchell, Letourneau and Maslin (2013) also demonstrated that deaf signers display increased attention to the lower part of faces, even in the absence of gaze shifts. The authors assume that a lifelong tendency to fixate the lower part of faces leads to increased saliency of this region even before overt attention shifts start. When fixation patterns are measured, deaf users of ASL fixate the bottom half of faces more than upper halves (Letourneau & Mitchell, 2011).

Altered processing of faces not only affects person recognition, but was extended to emotional faces displaying, e.g. anger and disgust (McCullough & Emmorey, 2009) even though earlier studies suggested that there is no advantage, or even a disadvantage for signers in the comprehension of nonverbal expression of emotions (Schiff, 1973; Weisel, 1985). McCullough and Emmorey (2009) compared categorical perception for affective facial expressions and linguistic facial expressions. Continua of morphed images of faces going in eleven steps from anger to disgust (affective facial expression) and from Wh-questions to yes–no questions (linguistic facial expression) served as stimuli. The results demonstrated that categorical perception in faces is not limited to affective stimuli, but also extends to linguistic facial stimuli in hearing non-signers. Importantly, while signers displayed categorical perception in both tasks, categorical perception of affective facial stimuli was only seen when preceded by linguistic stimuli. Thus, exposure to linguistic stimuli affects categorical perception of affective facial expressions.

Taken together, the use of signs for communication has an impact on several aspects of face perception. It remains unclear whether signers classify emotional expressions differently than hearing non-signers.

In the current study, we ask if the perception of emotional facial expressions is altered in deaf persons using signs for communication, i.e. speakers of DGS (German sign language) and/or persons using sign supported speech (SSS), all henceforth called deaf signers. We use morphed versions of happy and angry faces going from a neutral expression (0% morphing) to a maximum of expression (100% morphing), thereby varying the intensity of expression. Participants were asked to classify the faces as happy or angry. Importantly, we used original stimuli, but also created composite faces in which either the top or bottom half of the face was neutral (for an example see Fig. 1) while the other half expressed the respective emotional content in various degrees. Expressions of happiness and anger were chosen, because, firstly, both are well recognized compared to other emotional expressions (Ekman & Friesen, 1971, 1986; Gosselin & Kirouac, 1995; Izard, 1994; Tracy & Robins, 2008; Vassallo, Cooper, & Douglas, 2009). Secondly, for the perception of happiness, the mouth region constitutes a crucial part of the face (e.g. Beaudry, Roy-Charland, Perron, Cormier, & Tapp, 2014), whereas the eyes/eyebrows are more important for the recognition of anger (Calvo & Nummenmaa, 2008) even though these latter effects are inconsistent (for eye tracking results and an overview see Beaudry et al., 2014).

Fig. 1
figure 1

Combining faces displaying happy (upper panel) or angry (lower panel) expression with neutral expressions. The examples here show emotional expressions with maximum intensity (100%)

Based on the presented literature, we predict that deaf persons using signs outperform hearing persons in the recognition of happy expressions. The assumed underlying mechanism is enhanced processing of the mouth region due to a lifelong focus on this region. Focusing on the mouth region should come with costs for processing of information in the eye region, which should result in impaired processing of angry faces in deaf signers compared to hearing non-signers (but see Emmorey et al., 2008 for evidence for stronger processing of the eye region in native signers compared to beginners). We assume that the predicted effects become already visible when the emotions are not completely expressed (less than 100% morphing).

Methods

Participants

Twenty individuals with hearing loss contributed data to this study (mean age, 51.8 years ± SD 14.5; 11 women). They all used signing as a major means for communication. Thirteen were congenitally deaf and seven had an onset of deafness within the first 3 years of life. All started using signs as means for communication since the age of three or earlier. None of them indicated problems in communicating with signs and the experimenter (the second author B.N.-K. was raised bilingually with spoken and signed German) did not perceive any problems in communication. All twenty individuals indicated to comprehend sign supported speech at a native speaker level. None of them learned the Deutsche Gebärdensprache (German sign language, DGS) as a first language, because DGS was only later introduced as an official language in Germany (since 2002). Fourteen individuals were also speakers of the DGS, of which ten individuals mainly used sign supported speech. The remaining six used speech-accompanying gestures only. The experimenter used sign supported speech and switched to DGS if, e.g. under time pressure (DGS uses less words than speech-accompanying gestures). One further participant was excluded from data analysis due to premature abortion of the experiment. Recruitment of participants was performed via self-help groups in the area of Leipzig, social networks and personal acquaintances.

The group of signers was compared to a group of N = 20 hearing non-signers (mean age, 48.9 years ± SD 15.4; 8 women). None of them had any experience with sign language or SSS. All participants from both groups had at least 10 years of schooling.

Participants were tested individually either in their homes or laboratories of our departments.

Materials

Stimuli were faces of four men and four women from the FEEST database (Young, Perrett, Calder, Sprengelmeyer, & Ekman, 2002). Each individual displayed the emotional expressions “happy” and “angry”, or a neutral expression. Based on these original faces we created four different morphing levels (ML: 25%, 50%, 75%, 100%) for “happy” and “angry”. The neutral faces and face morphs served to generate additional stimuli of composite faces, in which the upper (or lower) halves of the neutral faces were combined with the lower (or upper) halves of the emotional faces. This was performed using Photoshop CC 2017 by cutting and recombining the pictures horizontally at the horizontal nose level. This resulted in three stimulus type conditions, i.e. original (O), eye region emotional–mouth region neutral (EEmo) and eye region neutral–mouth region emotional (MEmo). Thus, there were 192 stimuli (8 individuals × 2 expressions × 3 conditions × 4 MLs), see Fig. 2 for examples.

Fig. 2
figure 2

Top panel: example of an original stimulus set in four morphing levels (stimulus type O). Middle panel: example for a composite face: eye region neutral–mouth region emotional (stimulus type MEmo). Lower panel: example for a composite face: eye region emotional–mouth region neutral (stimulus type EEmo)

Procedure

The experiment was programmed using E-Prime 2.0 (Psychology Software Tools) and was performed on a Lenovo Thinkpad Laptop Z61 m. Each trial began with a fixation cross displayed for 1000 ms, placed on the nose tip of the faces in the center of the screen. The pictures were then presented for 200 ms and were replaced by a question mark on the same position where the fixations cross was placed. Each trial was terminated by a button press from the participants indicating if the face was categorized as “happy” or “angry”. The button press initiated a blank screen for 1000 ms. Participants entered their responses using “d” or “l” on a German keyboard layout. The allocation of facial expression to key was balanced across participants. The 192 faces were repeated three times and distributed across six blocks to enable five short breaks for participants. Before the actual experiment, participants performed 16 training trials with morphed pictures of the FEEST database which were not used in the experiment, and which all appeared in the original database version, i.e. the participants did not encounter the actual composite face conditions until the experiment started. Before the experiment, each participant filled out a consent form informing them about anonymity, the voluntariness of participation and how their data would be handled. The experimenter was a bilingual speaker of German and German sign language. Hearing participants were interviewed about their experience with sign languages and deaf individuals. Deaf persons completed a questionnaire about the nature of their deafness and their use of sign language.

The whole session lasted between 20–45 min in non-signers and about 30–60 min in signers.

The experiment conforms to the ethical principles of the declaration of Helsinki. The study was approved by the ethics committee of the medical faculty of the University of Jena (5498-04/18).

Data analysis

We analyzed accuracy, i.e. that happy expressions were classified as “happy” and vice versa for angry expressions. To rule out speed–accuracy tradeoffs we also analyzed latencies for correct responses. Responses longer than 6000 ms following stimulus onset were excluded from individual averages (0.22%). There were no responses faster than 200 ms.

Statistical analyses were performed using IBM SPSS Statistics 24. For accuracies, we first performed an ANOVA with repeated measures on the factors EMOTION (happy, angry), INTENSITY (ML 25%, 50%, 75%, 100%), STIMULUS TYPE (O, MEmo, EEmo) and the between-subject factor GROUP (signers, non-signers). To pursue significant interactions, follow-up ANOVAS including fewer factors or fewer levels within a factor, or two-sided t-Tests were performed where appropriate. In case of violation of the sphericity assumption, Huynh–Feldt corrections were performed (Huynh & Feldt, 1976). For latencies, the same analyses were performed based on individual averages of correct responses separated by conditions. Regarding our hypotheses, we report only the main effects for the four factors, and effects containing interactions involving GROUP and EMOTION.

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Results

Accuracies

Faces with angry expressions were classified more correctly (angry: M = 0.81, SEM = 0.017) than faces with happy expression (happy: M = 0.73, SEM = 0.014), (main effect for EMOTION F (1, 38) = 8.990, p = 0.005, η2p  = 0.191). Increasing the intensity of emotional expression led to higher recognition rates (main effect for INTENSITY (F (3, 114) = 286.257, p < 0.001, η2p  = 0.883). This effect was best explained as a linear effectFootnote 1 (F = 479.079; p < 0.001) indicating a linear increase of accuracy with increasing intensity. There was also a main effect for STIMULUS TYPE (F(2, 76) = 256.769, p < 0.001, η2p  = 0.871) with higher accuracy for the original version (O: M = 0.87 SEM = 0.01) than the composite faces (MEmo: M = 0.80 SEM = 0.01; EEmo: M = 0.64; SEM = 0.01). There was no main effect for GROUP (F(1, 38) < 1). Most importantly, we found the predicted two-way interaction EMOTION × GROUP (F(1, 38) = 11.158; p = 0.002; η2p  = 0.227) and two three-way interactions, i.e. GROUP × EMOTION × STIMULUS TYPE (F(2, 76) = 5.950, p = 0.008, η2p  = 0.135) and GROUP × EMOTION × INTENSITY (F(3, 114) = 6.887, p = 0.004, η2p  = 0.153).

To explore the origin of the interactions, we performed two additional ANOVAS, one for each emotional expression.

Angry faces

Non-signers recognized angry faces better than signers in all intensity levels (ML 25–75%: all p < 0.05), except for ML 100% (p = 0.19). This became expressed as a two-way interaction of GROUP and INTENSITY (F(3, 114) = 5.290, p = 0.010, η2p  = 0.122), see Fig. 3.

Fig. 3
figure 3

Top panel: accuracies separated by emotional expression, stimulus type and group. Error bars are standard errors of the mean (SEM). Lower panel: reaction times in ms for correct responses, separated by emotional expression, stimulus type and group. Error bars are SEM

Happy faces

We found two two-way interactions, i.e. GROUP × INTENSITY (F(3, 114) = 4.401, p = 0.014, η2p  = 0.104) and GROUP × STIMULUS TYPE (F(2, 76) = 6.194, p = 0.014, η2p  = 0.140), see Fig. 3 (top panel).

The interaction of group with intensity arose, because signers classified happy faces more accurately than hearing non-signers on all intensity levels, except for ML 100% (ML 25–75%: all p < 0.05; ML 100%: p > 0.05).

Regarding the interaction of group with STIMULUS TYPE, signers perform better than non-signers in the condition with happy eyes and neutral mouths (t(38) = 2.994; p = 0.005). Note that the signers were close to chance levels, whereas the non-signers tended to classify this stimulus type as angry. The two other comparisons did not reach significance (p > 0.07).

Latencies

We observed a main effect of INTENSITY (F(3, 114) = 27.806, p < 0.001, η2p  = 0.423) reflecting shorter latencies with increasing intensity (linear contrast: F = 37.466, p < 0.001).Footnote 2 The main effect of STIMULUS TYPE (F(2, 76) = 47.081, p < 0.001, η2p  = 0.553) indicated the shortest latency for the original version (M = 713 ms; SEM = 24 ms), followed by MEmo composite faces (M = 763; SEM = 29 ms), and the EEmo composite faces (M = 827 ms; SEM = 34 ms). Pairwise comparisons confirmed significant differences between all conditions (− 8.427 ≤ t(39) ≤ − 3.451, p < 0.004, t tests Bonferroni-corrected). Note, that higher accuracies generally go along with shorter latencies and vice versa, a pattern which does not suggest a speed–accuracy trade-off. The interaction of EMOTION × GROUP (F(1, 38) = 4.348, p = 0.044, η2p  = 0.103) was further qualified by an interaction of EMOTION × GROUP × INTENSITY (F(3, 114) = 2.702, p = 0.049, η2p  = 0.066). To follow-up on this three-way interaction we performed two additional ANOVAS, one for each emotional expression, see Fig. 3 (lower panel).

Angry faces

An interaction of GROUP × INTENSITY (F(3, 114) = 3.513, p = 0.018, η2p  = 0.085) reflected an effect of INTENSITY for deaf signers (F(3, 57) = 9.460, p < 0.001, η2p  = 0.332), i.e. decreasing latencies with higher intensities. This effect was not found for the hearing non-signers (F(3, 57) = 2.476, p = 0.071, η2p  = 0.115).

Happy faces

There were no effects involving the factor GROUP.Footnote 3

Discussion

Based on the emphasis on the mouth region for speechreading, we hypothesized that deaf signers compared to non-signers should be more accurate to recognize happy faces, i.e. an emotional expression for which the mouth region is crucial. Regarding angry faces for which the eye region is more important, there should be a difference between groups, but earlier studies argued for both directions. We used composite faces in which the upper or lower part was kept neutral while the other part varied in intensity of emotional expression. The results support our hypothesis for happy faces with higher performance in deaf signers. For angry faces, they performed worse than non-signers. These effects appeared especially under conditions when emotional expression was only subtle.

Our results add to the long-standing observation that the constant use of signs for communication alters functions not directly related to language. Face perception and processing of emotional faces is an example. In contrast to hearing persons, deaf signers exhibit a right visual field advantage during emotion judgment in faces that was also observed for, e.g. famous faces (Letourneau & Mitchell, 2013). The lifelong focus on hands as means of communication also elicited a left-lateralized N170 for handshapes due to their linguistic meaning (Mitchell, 2017). As an example, for changes in face perception, signers outperform non-signers in face recognition. McCullough and Emmorey (1997) suggested that the underlying mechanism is enhanced processing of featural information caused by focusing on these features during communication. Our data support this hypothesis, suggesting an emphasis on the mouth region in signers, with less focus on the eye region. This significant interaction of group and stimulus type is particularly prominent for facial expressions of low intensity, i.e. when the emotional information is only subtle (see Fig. 3). At first sight, the condition with happy eyes and a neutral mouth (EEmo) appears remarkable. While non-signers classify faces in this condition as angry, deaf signers are at chance level. The reason for this response pattern in non-signers is likely grounded in the impression that smiling eyes combined with a neutral mouth might convey suspicion or disbelief, rather than genuine happiness (see Fig. 2b, Stimulus Type EEmo). Hearing participants may thus have chosen “angry” more often than “happy”. However, this does not explain why deaf signers perceived these faces as “neutral” on average, and therefore, as relatively happier as did non-signers. In principle, this could be the result of deaf signers selectively relying on the eye region more often than non-signers, such that deaf signers had the impression of a happy face more often than non-signers. Alternatively, deaf signers relied solely on the information in the neutral mouth region in the sense that their processing mode was more feature-based. Consequently, deaf signers would have chosen the response “neutral”, had there been an option too. While it is difficult to decide between these alternatives in the absence of eye-tracking data in our study (but see Letourneau & Mitchell, 2011), we favor the second “neutral mouth” view for three reasons: First, holistic face processing was suggested to be similar for deaf signers and hearing non-signers (de Heering et al., 2012; McCullough & Emmorey, 1997), such that both groups should be equally able to process faces as a whole, i.e. including (ambiguous) information from the upper and lower half. Nevertheless, deaf signers did not behave like non-signers and did not preferentially choose the “angry” response. Second, although signers may preferentially process or attend to information in the visual periphery (Chen, He, Chen, Jin, & Mo, 2010; Chen, Zhang, & Zhou, 2006; Dye, Baril, & Bavelier, 2007; Hauthal, Neumann, & Schweinberger, 2012; Lore & Song, 1991; Neville & Lawson, 1987a, b; Proksch & Bavelier, 2002; Sladen, Tharpe, Ashmead, Grantham, & Chun, 2005), this does not account for the overall “neutral” responses in signers, because holistic processing would evoke the impression of disbelief and suspicion as in the non-signers. Third, while signers and non-signers are equally able to spot alterations in the eye region of unfamiliar faces, signers outperform non-signers when the mouth region is altered (McCullough & Emmorey, 1997).

This suggests that face perception in deaf signers is most likely characterized by a preferential processing of the mouth region and/or enhanced sensitivity to mouth changes due to substantial perceptual expertise with lip-reading. The notion of preferential processing implies that emotion perception in signers is largely driven by a selective (feature-based) processing of the mouth region, i.e. enhanced sensibility for this region. In models on face processing, a face is represented within a multi-dimensional space in the center of which a prototype or average exemplar face is located (Valentine, 1991). According to such models, the template prototype or exemplar face in signers would have a more detailed mouth representation due to ample perceptual experience and relevance. It is an intriguing hypothesis that expertise in a specific language changes such processing mechanisms that, at first sight, do not appear to be directly related to language.

How fast does the influence of using signs for communication on the processing of emotional faces occur? While there are currently no studies that monitor this development over time, there is some evidence that persons who formally learn ASL (ranging between 10 months and 5 years of experience) outperform non-signers in recognizing expressions from video clips (Goldstein & Feldman, 1996). This was most expressed for disgust and anger, but not for happiness, sadness and fear–surprise. The authors assume that there was a ceiling effect for happiness in their study. Regarding the negative emotions, Goldstein and Feldman (1996) do not offer an explanation, but assume that the “nature or content of ASL somehow differentially affects decoding of specific emotions. Perhaps the signs involving the communication of particular emotions vary in certain ways that make signers more attuned to some emotional displays than others” (p. 119). We agree with this assumption and suggest for future studies a comparison between static and dynamic stimuli displaying various emotional expressions.

Future research should also address what is driving the effects reported by us and others. As outlined in the introduction, speechreading is quite common and not restricted to deaf signers. As it also has been found in deaf non-signers (Bernstein et al., 2000), the present data alone do not allow to determine whether the effects depend on sign usage for communication, or on deafness per se. Of relevance, a recent study (Sidera, Amado, & Martinez, 2017) explored facial emotion recognition (matching emotion labels to drawings of emotional faces) in deaf children using hearing aids. Their capacity to recognize emotions was delayed for some emotions (“fear”, “surprise”, disgust”), but not for “happiness”, “anger” and “sadness”. The ability to recognize emotions also was related to linguistic skills, but not to the degree of deafness (Dyck & Denver, 2003; Ludlow, Heaton, Rosset, Hills, & Deruelle, 2010), which could argue for sign usage as the likely factor underlying the present effects. Nevertheless, the degree to which changes in emotion recognition is influenced by sign usage or by deafness per se clearly warrants further investigation, particularly in adults. In a similar vein, the registration of eye gaze patterns might provide fruitful insights in contrasting deaf signers using a sign language, deaf signers using SSS and hearing signers. Although we have no indication in our deaf group that individuals using solely SSS differ from those using SSS and German sign language, it is possible that the gaze pattern while inspecting faces differs within individuals depending on the way of communication, using either DGS/ALS or SSS.

Similar to the issue of how the linguistic or mode of communication background influences perceptual and gaze patterns, there is also the issue of culture. While, as reported, a whole body of evidence exists that deaf signers attend and fixate to the mouth region or bottom part of the face compared the upper part, one study showed the inverse effect: Watanabe, Matsuda, Nishioka, & Namatame (2011) reported more and longer fixations on the eyes than the nose in Japanese deaf signers compared to hearing controls. The authors attribute this discrepancy to the literature to cultural factors, because in many Asian cultures it is regarded as rude to look directly into the eyes. As such, while our study confirms earlier studies on deaf signers using sign languages such as ALS, it will be fascinating to follow this in different cultures exhibiting various gaze patterns for social communication.

Taken together, the perception of faces and, as shown here, of faces with emotional expressions is altered in deaf signers. The study by Goldstein and Feldman (1996) suggests that this does not only happen in childhood, but also later and in a relatively short amount of time. Amazingly, learning to communicate via signs does not only affect perceptive processes, but also those responsible for expression of emotions. Persons who learn ASL are more adept to pose facial emotional expressions than non-signers (Goldstein, Sexton, & Feldman, 2000). It will be a fascinating endeavor for future research to find out how fast the perceptual system adapts to the requirements of language and how the interplay of perception and expression develops.