The ability to accurately identify others’ emotional facial expressions is important in social interactions (e.g., Ekman and Friesen 1976a; Joormann and Gotlib 2006). Biases in emotion recognition (ER) have been studied across a range of psychiatric conditions. Deficits in ER are linked to interpersonal difficulties such as isolation and social rejection (Carton et al. 1999), which are common problems in emotional disorders (e.g., depression, anxiety). Changes in ability to recognize emotional faces may help predicting or monitoring response to treatment in depression (Bourke et al. 2010; Shiroma et al. 2014; Venn et al. 2006). Thus, exploring ER in clinical samples is critically important.

In addition to depression, age plays a significant role in ER abilities. A review showed that older adults are less accurate in recognizing anger, sadness, and to some extent, fear (Isaacowitz et al. 2007). A meta-analysis on the effects of age on ER showed that older adults are worse in recognizing facial expressions of anger, sadness and fear, and to a lesser extent, surprise and happiness (Ruffman et al. 2008). In a cross-sectional study exploring age-related differences in ER, results revealed that decline in recognition of sadness and anger starts around age 30 (Mill et al. 2009). In two studies of healthy older adults, results showed an age-related decline in ER independent of changes in perceptual abilities, fluid IQ, processing speed, basic face processing, and reasoning about non-face stimuli (Sullivan and Ruffman 2004). Recent research suggests that age-related differences in ER are context and emotion specific—there is substantial within-person variability in ER well into old age (Richter et al. 2011). It is noteworthy that although ER abilities decrease with age, depression severity also decreases with age (e.g., Mezuk and Kendler 2012). Although an increase in depression-related psychopathology may be associated with ER difficulties in younger adults, it is likely not the cause of ER decline in older adults.

Findings about the effect of depression on ER are inconsistent. Although some research showed that individuals with depression have faster recognition of sad and angry faces compared to social phobia patients and healthy controls (Joormann and Gotlib 2006), other research shows that depression decreases speed and accuracy of recognizing happy expressions compared to healthy controls (Gollan et al. 2008; Yoon et al. 2009). In comparing patients with depression to healthy controls, depressed patients showed a perceptual bias towards unpleasant versus pleasant expressions, and hypersensitivity to angry faces (Liu et al. 2012). In a study comparing ER in patients with comorbid depression and anxiety to those with only depression, the comorbid group showed diminished sensitivity to recognizing happy and sad facial expressions (Berg et al. 2016). A meta-analysis on ER and depression concluded that ER is impaired across all basic emotions except sadness (Dalili et al. 2015). However, these authors note that the majority of ER studies to date are underpowered. The current study addresses this limitation with our adequately powered, large sample.

The current study is an extension of existing research on ER; however, our approach is different because we analyzed depression severity as a dimensional variable rather than as a binary diagnosis (e.g., presence or absence of major depressive disorder). Previous group-based comparisons may have led to inconsistent or weaker ER results because individual differences in symptom severity were collapsed into a binary variable. Our approach takes into consideration the heterogeneity of depression symptoms and allows for an examination of individual differences in severity above and below the clinical mood disorder diagnostic threshold. Our study expands on prior work by examining age as a continuous variable rather than comparing older versus younger cohorts. We expected that examining a larger span of age could lead to novel insights regarding the relationship between age, depression, and ER.

The primary aim of the present study was to examine ER in a sample of outpatients with anxiety and mood disorders. To our knowledge, this was the first study to explore ER in a large, transdiagnostic sample that was well characterized clinically, and somewhat diverse with respect to age. Based on prior studies described above, we hypothesized that ER scores for negative emotion categories (fear, anger, sadness) would decline with increased age and depression status, such that the oldest adults with depression would have the lowest accuracy and perform slowest, compared to older adults without depression, and younger participants. Of note, the current study was underpowered to test the latter part of this hypothesis, given our age range.

Method

Participants

Participants were 644 patients who presented for assessment and treatment at the Center for Anxiety and Related Disorders (CARD) at Boston University. The Boston University Institutional Review Board approved this study. All participants provided informed consent to be a part of the study. This study was a subset of an intake assessment at CARD, and we had no dropouts or refusals to participate. The sample was predominantly female (57.6%), Caucasian (82.1%; Asian, 9.2%, African American, 7.1%, Other/not reported, .9%, American Indian/Alaskan, .3%, Pacific Islander, .3%), and non-Hispanic (91.1%). The average age was 31.31 (SD = 12.66, range = 18–76).

Diagnoses were established using the Anxiety and Related Disorders Interview Schedule for DSM-5 (ADIS-5; Brown and Barlow 2014). All participants had a clinical anxiety or mood disorder diagnosis, with 34.6% of our sample meeting criteria for major depressive disorder (MDD) or persistent depressive disorder. The sample breakdown of principal diagnoses was as follows: coprincipal diagnosis (15.2%), panic disorder (3.9%), agoraphobia (2.5%), body dysmorphic disorder (0.3%), generalized anxiety disorder (GAD) (20.8%), social phobia (19.9%), specific phobia (7.6%), obsessive-compulsive disorder (8.2%), posttraumatic stress disorder (2.0%), MDD (4.3%), persistent depressive disorder (PDD) (4.0%), other specified anxiety disorder, not resembling GAD (1.1%), other specified anxiety disorder resembling GAD (4.3%), other specified depressive disorder (0.8%), personality disorder (0.2%), illness anxiety disorder (0.8%), somatic symptom disorder (1.9%), other clinical disorder (2.2%). Coprincipal diagnoses refer to comorbid diagnoses that are both the primary presenting concern, and equally interfering, distressing, and severe. The sample breakdown for MDD and PDD rates (collapsing principal and additional diagnoses) was 18.5% and 16.1%, respectively.

Measures

Anxiety and Related Disorders Interview Schedule for DSM-5 (ADIS-5; Brown and Barlow 2014)

The ADIS-5 is a semi-structured interview designed to establish a diagnosis of DSM-5 anxiety, mood, somatoform, obsessive-compulsive, trauma, and substance use disorders, and to screen for other disorders (e.g., psychotic disorders). The ADIS-5 was administered by trained Ph.D.-level psychologists and advanced doctoral students in clinical psychology who underwent extensive training to meet strict certification criteria (see Brown et al. 2001, for details).

Beck Depression Inventory-II (BDI; Beck et al. 1996)

The BDI-II is a widely used 21-item self-report measure of severity of depressive symptoms. The BDI-II has been shown to have strong psychometric properties in outpatient samples (Steer et al. 1997). The BDI was rescored (BDI-R) using the 10 items that are specific to unipolar mood disorders (Items 1–9, 13). Depression-specific items were determined by a factor analysis (see Brown et al. 1998). The rescored BDI eliminates nonspecific items of negative affect and general distress including irritability, sleeplessness, and fatigue. In our sample, the correlation between BDI-II and BDI-R was .93. We therefore chose to use the BDI-R for all analyses, due to its factor analytic properties in measuring depression more specifically, i.e., removing general negative affect and distress.

Emotion Recognition Task (“Facial Morphing”)

The facial morphing task entails watching “movies” of computer-morphed faces that change slowly from a neutral to a fully emotional expression. Stimuli faces were taken from Ekman and Friesen’s (1976b) Series of Facial Affect. After responding to two practice trials, participants were shown 40 morphed sequences (male and female actor expressing angry, happy, fear, and sad emotion five times each) of the faces in random order. See Fig. 1.

Fig. 1
figure 1

Emotion recognition task procedure

Using E-Prime software, each face was presented for 500 ms. The black-and-white face images were approximately 14.5 × 10.5 cm in size. Faces were presented in the middle of the screen using a black background. Images were presented on a high-resolution 15″ monitor. Participants were presented with a neutral face (0% emotionality), which progressed in 2% increments toward 100% emotionality. Each increment of emotionality, or frame, was displayed for 500 ms, with every fifth frame shown 1, 2, 3, 4, or 5 times the normal 500 ms length, to jitter the relationship between time and emotional intensity. Participants were asked to press a keyboard key as soon as they detected an identifiable emotional expression. Pressing the key stops the movie and asks participants to identify the face as expressing happiness, sadness, fear, or anger. The computer records the identification rating and emotional intensity of the face that is displayed at the moment of the key press. Accuracy scores were based on proportion correct; possible range of scores is 0 to 1.0. Intensity scores were based on the intensity of the morphed expression at the time of the keyboard press, with possible scores ranging from 0 (neutral) to 100 (fully morphed emotion). Higher intensity scores signify that participants required greater emotion to identify the emotion type, and thus, higher intensity is indicative of slower recognition. Intensity scores were only calculated for trials where participants were accurate. Trials where participants pressed the space bar to select face type at 0% intensity (i.e., neutral) were not scored as accurate or incorrect, and intensity scores were not calculated, because a key press at 0% intensity implies that the participants were holding down the space bar when the trial began and never saw the emotional face stimulus.

Data Analyses

Mplus 7.4 was used conduct analyses RStudio was used to visualize data and plot results. Correlations were conducted to examine the impact of depression severity and age on accuracy and intensity of ER performance. Regressions were used to test for main and interaction effects of age and depression on ER. Because ER accuracy and intensity data were non-normal, all analyses were computed in Mplus using an estimator robust to non-normality (MLR). For the correlations we conducted, a sample size of 28 and significance level of .05 was powered at 80% to detect a large effect, while a sample of 84 needed to detect a medium effect size with the same parameters (Cohen 1992). Power analyses showed that our sample of 644 at alpha .05 was powered at 72% for detecting a small effect, whereas power exceeded .99 for detecting a moderate to large effect size.

Results

The means and standard deviations of correct identifications (accuracy) and intensity of faces were as follows: happiness accuracy M = .98 (.09), sadness accuracy M = .92 (.13), anger accuracy M = .74 (.19), fear accuracy M = .86 (.18), happiness intensity M = 32.08 (10.31), sadness intensity M = 48.94 (15.52), anger intensity M = 51.03 (15.37), fear intensity M = 47.93 (11.73). Overall accuracy scores ranged from .25 to 1 (M = .88, SD = .11), and overall intensity scores ranged from 8.20 to 100 (M = 44.40, SD = 11.12).

The average BDI-R score in our sample was 9.38 (SD = 5.82). Participants with diagnoses of MDD or PDD had significantly higher BDI-R scores (M = 13.45, SD = 5.25), than participants without these diagnoses (M = 7.31, SD = 4.93), F (1, 578) = 193.07, p < .001), and the size of the effect was large, Cohen’s d = 1.22. There was no significant difference in age between participants with a diagnosis of depression, and those without (p = .84). However, there was a significant correlation between age and BDI-R (r = −.20) indicating that increased age was associated with a decrease in depression severity, with a small effect (f2 = .04).

Correlations were computed to examine the impact of depression severity, based on BDI-R scores, on ER. The correlations were not significant, indicating that depression severity was not related to ER accuracy or intensity for any face type (Pearson range of rs = −.04 to .02). Using correlational analyses, we also explored the effect of age on ER. For each major analysis, we corrected for multiple comparisons (4 emotion categories, intensity and accuracy scores) using a conservative Bonferroni adjustment (p < 0.00625 uncorrected). Age was significantly associated with accuracy and intensity of ER such that as participants got older, overall accuracy decreased (r = −.15, p < .001), see Fig. 2. Increased age was associated with decreased accuracy and greater intensity required (slower speed in recognition) for sad (accuracy: r = −.21, p < .001; intensity: r = .13, p < .01) and fearful faces (accuracy: r = −.13, p < .01; intensity: r = .11, p < .01). Results survived Bonferroni correction for multiple comparisons (p < .006). See Fig. 3. Effect sizes were small (Cohen 1988), ranging from .01 to .04.

Fig. 2
figure 2

Emotion recognition accuracy by age. Note. Ages are binned in for visualization purposes only. Figures utilize locally weighted scatterplot smoothing (LOESS) curve fitting with a single data point (mean) for each age time point

Fig. 3
figure 3

Emotion recognition intensity by age. Note. Ages are binned in for visualization purposes only. Figures utilize locally weighted scatterplot smoothing (LOESS) curve fitting with a single data point (mean) for each age time point

Finally, we conducted regressions to test if there was an interaction between Age x Depression status, i.e., the presence or absence of depression (dummy coded depression diagnosis = 1, no depression diagnosis = 0) and ER. We also tested the interaction of Age x Depression Severity (BDI-R), and ER. Results of dummy coded regression analyses showed that the interaction of Age x Depression Status is not significant in predicting accuracy (p = .88) or intensity scores (p = .75) averaged across emotions. Additionally, Age x Depression Severity did not predict accuracy (p = .12) or intensity scores (p = .96) averaged across emotions, or for any specific emotion category, according to regressions.

Discussion

This study explored the impact of depression severity and age on ER. Contrary to our hypothesis, depression severity was not a significant predictor of ER. We expected that those with higher depression scores would be slower and less accurate at identifying happy faces (i.e., Yoon et al. 2009), but did not find a significant relationship between depression and ER speed or accuracy. Age was associated with lower ER accuracy and higher intensity scores for sad and fearful faces, and lower overall accuracy. This finding is consistent with previous research that has demonstrated an age-related shift in the ability to recognize negative emotions (see Ruffman et al. 2008 for a meta-analysis).

Considering that age played a significant role in ER in our study (range of rs = .11 to .21), future research should explore the relation between ER and age in more detail, examining the role of possible explanatory factors such as attention, processing speed, and other cognitive variables. One possibility is that age affects these cognitive variables more directly, and factors related to cognition itself (rather than age) are the reason for declining performance. Another possibility is that an effect of aging on the brain is what determines the change in performance over time. For example, slowed processing speed, as posited by Salthouse (1996), would predict slowed processing across many cognitive domains. The present data partially support this notion, since age predicted higher intensities (slower speed) for sadness and fear, along with overall lower accuracy. However, we did not observe a decline in performance for happy face recognition, suggesting some further support for socioselectivity theory, which posits that aging is associated with a preference for positively valenced information over negatively valenced information (Carstensen 1992; Carstensen et al. 1999). Future research should include additional tests of processing speed, as a potential mediator of the relationship between age and ER intensity performance.

In a recent study investigating facial recognition of happiness among older adults with active and remitted MDD (Shiroma et al. 2016), results showed that depressed veterans 55 years and older had a significantly lower sensitivity to identify happiness at a moderate intensity of facial stimuli compared to their non-depressed counterparts. In the current study, the interaction between Age and Depression Status (i.e., depressed vs. not depressed) was not significant. This was contrary to our expectations, and may be due to the relatively low mean age in our sample, with older participants not as well represented. The median age was 27, which may have weakened the age and ER relationships. Additionally, the majority of our sample did not have a diagnosis of major depression, which could have impacted our results.

There are several reasons why our findings may differ from the previous ER literature. First and foremost, our sample was diagnostically diverse, with comorbid anxiety and mood disorder diagnoses. Other studies that have examined ER and depression typically compare a depressed group to a control group (see Joormann and Gotlib 2006; Yoon et al. 2009), and our sample was entirely clinical. Additionally, we examined relationships dimensionally within a clinical group. A potential reason for why the current dimensional approach was unsuccessful is because our sample did not have severe enough levels of depression to detect differences. Furthermore, comparing a depressed group to a healthy control group is more powered to detect differences than a dimensional approach such as ours, especially if the range of depression symptoms is not wide enough to adequately represent more severe depression cases.

Despite the strengths of our study, including the large clinical sample and focus on dimensional assessment, it is not without limitations. A lack of consistency in the field across ER studies makes results difficult to compare to the previous literature. With prior findings ranging from depression improving ER in negative expressions (Joormann and Gotlib 2006), to depression decreasing ER in happy expressions (Gollan et al. 2008; Yoon et al. 2009), to depression related ER deficits in identifying neutral expressions (Leppänen et al. 2004; Liu et al. 2012), more research is needed. Although this study aimed to measure the effects of depression severity on ER, our mean BDI-II score was in the mild to moderate range, and results are limited by restricted range. Future studies in a sample with more varied and more severe depression could provide new insights. Additionally, the effects of age were small in our study, suggesting the need for a broader age range and larger sample to clarify the effects of age on ER in future work. Next, there were several other variables that we did not measure that could contribute to ER. For example, because the ability to quickly and accurately identify a facial expression may be linked to factors such as confidence, studies examining ER and response confidence could change the way that researchers and professionals interpret social cognition results. If confidence in judgments is low, delayed decision-making and enhanced uncertainty may occur, which could be associated with social withdrawal and isolation common in depression (Fieker et al. 2016).

This study builds on existing ER findings through our large and transdiagnostic clinical sample. Conceptualizing depression dimensionally rather than in a binary manner allowed for exploration of individual differences in severity above and below the diagnostic threshold. Our results highlight the importance of considering a variety of factors that may be at play including comorbidity, attention, arousal, and affect at the time of the experiment. More research is needed to clarify baseline ER differences within diverse clinical samples.