Introduction

The Emotion Regulation Questionnaire (ERQ; Gross and John 2003) is a widely used scale to assess individuals’ dispositional use of two emotion regulation (ER) strategies, namely cognitive reappraisal and expressive suppression, and it is theoretically based on the Process Model of ER (Gross 2001, 2007). According to this model, the wide range of ER strategies people use to control their emotions can be differentiated based on when they intervene in the emotion generation process (Gross 2002; Gross and Thompson 2007): Antecedent-focused forms of ER are thought to act before the complete activation of emotional response tendencies has occurred and, for this reason, the strategies belonging to this family are posited to modify the entire temporal course of an emotional response. For instance, cognitive reappraisal consists in the attempt to reinterpret the meaning of a situation to modify its emotional impact (e.g., McRae et al. 2012). By contrast, response-focused strategies are thought to occur after emotional response tendencies have been fully activated, and therefore, to be more effortful and resource-consuming. For instance, expressive suppression involves the attempt to inhibit or reduce the outward display of emotional expressive behavior (e.g., Butler et al. 2003).

Extensive literature has so far examined the implications of the use of emotion regulation strategies for the individual’s mental health (e.g., Gross 1998; Haga et al. 2009; John and Gross 2007). Within this area of research, the ERQ has been commonly used to assess individual differences in reappraisal and suppression, and to examine the consequences of the habitual use of these strategies in daily life for the individual’s affective, cognitive, and social functioning (e.g., Gross and John 2003; John and Gross 2004). Existing evidence has generally shown that habitual suppression is associated with adverse outcomes, such as increased negative affect, reporting of psychopathological symptoms (e.g., depression and anxiety), and impaired social functioning (e.g., English et al. 2013; Srivastava et al. 2009); by contrast, habitual use of cognitive reappraisal is predictive of positive psychological functioning, including higher levels of positive affect and greater well-being (e.g., Balzarotti et al. 2017; Haga et al. 2009; Moore et al. 2008; Rice et al. 2018; Schutte et al. 2009; Wiltink et al. 2011; for an exception, see Meyer et al. 2012).

In the original validation of the ERQ conducted on a large sample of undergraduate students (N = 1483; Gross and John 2003), the ten items of the ERQ were found to load on two independent factors (represented by reappraisal and suppression). Since then, the ERQ has been validated in several languages using confirmatory factor analysis (CFA; e.g., Abler and Kessler 2009; Balzarotti et al. 2010; Cabello et al. 2013), generally replicating the two-factor structure of the scale and finding good psychometric properties (for an exception, see Sala et al. 2012). Additionally, measurement invariance has been found across ethnic and gender groups (e.g., Melka et al. 2011). So far, however, most validation studies have been conducted on student samples, thus limiting the generalizability of results to other populations.

Notably, recent studies conducted on community samples have failed to replicate the ERQ original two-factor structure. For instance, Wiltink et al. (2011) tested the factorial structure on the German version of the ERQ in a large German community sample, reporting poor model fit compared to the validation study in a sample of German students (Abler and Kessler 2009). To reach adequate fit, one of the items tapping cognitive reappraisal (Item 8) was allowed to load on both reappraisal and suppression factors and was freed from equality constraint with other item factor loadings. Moreover, a small but significant interrelation emerged between the reappraisal and suppression factors. Spaapen et al. (2014) tested the two-factor structure of the ERQ on two large community samples from Australia and United Kingdom, finding barely adequate model fit. Good model fit was obtained after dropping one of the items tapping reappraisal (Item 3), which showed high covariance with another item of the reappraisal subscale (Item 1). The authors have thus proposed a shorter, 9-item version of the ERQ (ERQ-9), which was found to be equivalent across the two samples and demographic variables (i.e., age, gender, and education). Similarly, Rice and collaborators (Rice et al. 2018) found poor model fit when testing the original ERQ in two adult community samples from Australia and Canada, but good model fit for the ERQ-9 (as well as factorial invariance across gender and education level) for both samples. In a validation study employing a Finnish population-based sample, Westerlund and Santtila (2018) found good model fit after dropping Item 5 due to low factor loading and allowing error terms of reappraisal items to correlate. Finally, Brady et al. (2018) found comparable fit for both the ten item and nine item versions of the ERQ in a sample of older adults, but error terms in the reappraisal scale were allowed to correlate to reach adequate model fit when considering the ten item ERQ.

Some studies have replicated the original factor structure of the ERQ employing non-student samples, such as a sample of women with cancer (Brandão et al. 2017), a Spanish community sample (Cabello et al. 2013), a sample of athletes (Uphill et al. 2012), and a sample of Swedish parents of children aged 10–13 years (Enebrink et al. 2013). Notably, however, some of these studies (Cabello et al. 2013; Enebrink et al. 2013) have reported evidence of acceptable (rather than good) model fit.

Overall, these findings raise questions about potential nonconformities in the factor structure of the ERQ when assessing non-student samples, which may be determined by cultural differences (i.e., validations in different countries), use of translations of the original scale, or differences between student and non-student samples (e.g., demographic variables such as age and education). The interest in ER has increased in the last decade, so that the number of studies using scales such as the ERQ to measure ER strategies in non-student samples − including clinical samples − has been growing (e.g., Andrei et al. 2018; D’Avanzato et al. 2013; Uphill et al. 2012; Velotti et al. 2015). For this reason, more research is needed to further validate and understand the measurement of this construct (Rice et al. 2018; Wiltink et al. 2011).

The Present Study

The main goal of the present study was to test the psychometric properties of the ERQ in an Italian community sample (N = 415) and to assess measurement invariance when compared to a sample of Italian undergraduate students (N = 371). More in detail, in this work we (1) tested the reliability of the original two-factor structure of the ERQ by performing single group confirmatory factorial analyses (CFA) in the community and student samples; (2) assessed measurement invariance in the two samples; (3) assessed measurement invariance across gender and age; (4) after establishing the measurement invariance, we tested the latent factor means difference across gender and age. Finally, we examined the associations between the ERQ scales and measures of psychological well-being.

When conducting invariance tests, previous studies have generally shown equivalence across gender and age (Melka et al. 2011; Rice et al. 2018; Spaapen et al. 2014). So far, however, gender and age differences in reappraisal and suppression use have been tested by comparing observed mean scores, rather than latent mean differences. Prior studies have consistently found gender differences in the use of suppression, with males reporting more frequent use of this strategy (e.g., Balzarotti et al. 2010; Gross and John 2003; Haga et al. 2009; Melka et al. 2011; Wiltink et al. 2011). Fewer studies have found gender differences in reappraisal use, with females scoring higher than males (Chen 2010; Rogier et al. 2017; Spaapen et al. 2014; Westerlund and Santtila 2018).

Evidence is far less consistent when examining age differences in the use of ER strategies. Since several studies have found that − despite age-related losses − older adults tend to report higher levels of well-being (and lower levels of negative affect) compared to younger adults (e.g., Cacioppo et al. 2008), theorists have posited that older adults are better able to regulate their emotions than younger adults (Urry and Gross 2010). For this reason, it has been hypothesized that individuals should use reappraisal more often and suppression less often with increasing age. Consistent with this idea, some studies have found that older adults reported less use of suppression than younger respondents (e.g., John and Gross 2004; Westerlund and Santtila 2018). Some studies, however, have failed to find any significant association between ER strategies and age (e.g., Spaapen et al. 2014), while others have shown a positive association between age and suppression scores, thus indicating that suppression use my increase in older age (Brummer et al. 2014; Laloyaux et al. 2015; Wiltink et al. 2011).

Finally, prior research has consistently shown that habitual use of reappraisal use is associated with higher self-reported well-being (e.g., higher experience of positive affect, satisfaction with life, psychological well-being), while the opposite is true for suppression (e.g., Gross and John 2003; Haga et al. 2009).

Method

Participants

The community sample consisted of 415 Italian speaking adults (52% female) ranging between 21 and 87 years (M = 45.70, SD = 13.28). Of the participants, 11% ranged from 20 to 29 years, 28% from 30 to 39, 23% from 40 to 49, 23% from 50 to 59, ad 16% were older than 60. The educational level ranged from primary school (2%), junior high school (9%), senior high school (54%), university (26%), and post-university education (9%). Among the respondents in our sample, 29% were single, 58% married or living together, 8% divorced and 5% widowed. Concerning occupation, 29% of participants reported to be employees, 12% retired, 12% managers or entrepreneurs, 8% housewives, 7% healthcare practitioners (physicians, veterinarian, nurses), 6% industrial workers, 6% teachers, 4% psychologists, educators or social workers, 3% retailers, 3% lawyers, 2% engineers and architects, 2% insurance agents, 2% computer technicians, 1% graphic designers, 1% journalists, and 2% unemployed. 97% of our sample were Caucasian, 2% Asian, .5% Latinos, and .5% North-African.

The student sample consisted of 371 undergraduate students (age range 21–34; M = 22.08; SD = 2.04; 53% Males) from different disciplines, including psychology (46%), computer science and engineering (25%), health and life sciences (11%), law (2%), economy (5%), literature and languages (2%), and others (e.g., education, motor sciences, architecture; 9%). All subjects were Caucasian.

All participants were volunteers and received no credit or compensation for their participation.

Measures

Emotion Regulation

Emotion regulation use was assessed with the Italian version of the Emotion Regulation Questionnaire (Balzarotti et al. 2010), a 10-item self-report questionnaire that assesses the use of cognitive reappraisal (6 items) and expressive suppression (4 items). The items were rated on a 7-point-Likert scale from “strongly disagree” to “strongly agree”. The Italian version of the ERQ has demonstrated good internal consistency reliability (.84 for Reappraisal and .72 for Suppression) and two-month test-retest reliability (.67 for Reappraisal and .71 for Suppression), comparable to that of the original English version of the ERQ (Gross and John 2003).

Subjective Well-Being

The Positive and Negative Affect Schedule (PANAS; Watson et al. 1988) comprises two 10-item subscales: Negative Affect (NA) reflects a general dimension of distress measured by adjectives such as afraid, distressed, and nervous; Positive Affect (PA) reflects the level of pleasant engagement and includes adjectives such as active, determined, and strong. Respondents rate the extent to which they usually experience each term on a 5-point scale ranging from 1 (very slightly or not at all) to 5 (extremely). The Italian adaptation of the PANAS (Terracciano et al. 2003) has demonstrated robust psychometric properties. In this study, Cronbach’s α were = .77 for PA and α = .85 for NA.

Happiness

The Subjective Happiness Scale (SHS; Lyubomirsky and Lepper 1999) is a 4-item scale measuring subjective happiness. The respondents are asked to rate their levels of happiness using a 7-point Likert scale. The Italian translation of the SHS (Iani et al. 2014) has been validated on a large community sample demonstrating good psychometric properties. In this study, Cronbach’s α was = .82.

Psychological Well-Being

The Psychological Well-Being scale (PWB; Ryff and Keyes 1995) is a 84-item scale designed to measure six aspects of human actualization (14 items each): Autonomy (i.e., self-regulation and independence), Environmental Mastery (i.e., a sense of competence to manage the environment), Personal Growth (i.e., a sense of improvement and expansion over time), Positive Relations with Others (i.e., the ability to maintain trusting relationships with others), Purpose in Life (i.e., a belief in the meaning of one’s life), and Self-Acceptance (i.e., a positive attitude toward the self). The items are rated on a Likert scale ranging from 1 (strongly disagree) to 6 (strongly agree). The Italian translation of the PWB (Ruini et al. 2003) has shown good psychometric properties comparable to those of the original scale. In the present study, alphas ranged from .80 to .85.

Procedure

The community sample was recruited as part of a larger survey study examining the relationship between emotion regulation and well-being (for details concerning recruitment, see Balzarotti et al. 2016). After signing a consent form, the participants received a questionnaire package including the ERQ.

Part of the student sample (N = 130) was recruited as part of a prospective study examining emotion regulation, anxiety, and subjective well-being in university students over time (for details see Balzarotti et al. 2017). In the present study, only the baseline (trait) measures obtained at Time 1 were used. The remaining students were recruited for the current research. All students were recruited during class lessons (after asking for cooperation and permission to the class professor, the researchers briefly explained the project and asked students whether they were willing to participate) or announcements on Facebook student pages. After signing a consent form, they received a questionnaire package including the ERQ and were instructed to return the questionnaire to the researchers after completion.

The participants received different variants of the questionnaire. All variants included the ERQ among the measures of ER, while the instruments measuring well-being had a different number of respondents (PANAS: N = 730; PWB: N = 395; SHS: N = 185).

Data Analysis

Confirmatory factorial analyses were conducted using EQS. 6.1 Software (Bentler 1995; Byrne 2006). Since multivariate kurtosis was found to be indicative of nonnormality in both samples (Students: Mardia’s coefficient = 25.94, normalized estimate = 16.13; Community: Mardia’s coefficient = 24.82, normalized estimate = 16.32; Bentler 1990), the Satorra-Bentler scaled correction of ML was used, as it provides an adjusted, more robust measure of fit for nonnormal data (Hu et al. 1992). The fit of the models was assessed using several fit indices: the S-B chi-square (χ2), the McDonald’s Noncentrality Index (NCI); the Comparative Fit Index (CFI), the Tucker Lewis fit Index (TLI), Root Mean Square of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). Generally, model fit is considered acceptable when CFI and TLI are ≥ .90, good when CFI ≥ .95. In addition, SRMR should not exceed .08 for good fit; RMSEA values ≤0.08 are indicative of acceptable fit and ≤ .06 of good fit (Hu and Bentler 1999).

In the first step, we conducted single group CFA analyses on the student and community samples testing the original two-factor structure of the ERQ. Modification indices (MIs) were used to evaluate potential misfit.

In a second step, the measurement invariance of the final model was tested between the student and the community samples with progressively restrictive stages (Hirschfeld and Von Brachel 2014; Putnick and Bornstein 2016; Widaman and Reise 1997). (1) An unrestricted model was tested as baseline (configural invariance); then, (2) factor loadings (metric or weak invariance), and (3) indicator intercepts (scalar or strong invariance) were constrained to be equal between groups. Finally (4), variances, covariance and (5) error variances (strict invariance) were also constrained. Measurement invariance was evaluated by testing the decrease in model fit when the constrained models were compared to the baseline, configural model. Because the chi-square difference (Δχ2) statistic if highly sensitive to sample size, it has been recommended to assess decrease in model fit using multiple fit indices (Chen 2007; Cheung and Rensvold 2002). More in detail, the following cutoff values have been recommended: ΔCFI values lower than .01, ΔRMSEA values lower than .015, and ΔNCI lower than .02 (Cheung and Rensvold 2002).

After establishing measurement invariance between the student and community samples, further analyses were performed on the total sample (N = 786). First, measurement invariance was tested across gender and age following the same steps detailed above. Second, latent means differences were compared using the scalar invariance model as baseline (Byrne 2006). To test invariance across gender, the male group (N = 399) latent mean was set to 0 and allowed to vary in the female group (N = 387). The value of the critical ratio (CR) was calculated by parameter estimate divided by its standard error, which tests whether the coefficient is significantly different from 0. A CR value larger than 1.96 indicates statistically significant differences in the latent means. A positive value means that the comparison group has higher latent mean than the reference group, while a negative CR indicates that the comparison group’s latent mean is smaller than the reference group. Cohen’s d was also computed as a measure of effect size.

To test invariance across age, one commonly used strategy is to divide the sample into two age groups (younger and older) by median split. In our total sample, however, median age in the total sample corresponded to 27 years, and testing invariance using this cutoff would have strongly resembled testing invariance between the student and community groups (i.e., 91% of the younger group were students). For this reason, invariance across age was tested in the community sample only. The younger group (age range: 21–44; N = 208) latent mean was set to 0 and allowed to vary in the older group (age range: 45–87; N = 207).

Finally, we examined whether emotion regulation strategies as measured by the ERQ were associated with individuals’ well-being. Two-tailed Pearson’s correlations were computed. In addition, multiple regression analyses were run for each dimension of well-being: Reappraisal and suppression were included as independent variables after controlling for age and gender.

Results

Single Group Factor Structure and Invariance Testing

Results of the single-group CFAs suggested barely adequate fit in the student sample, χ2(34) = 119.3, p = .000, χ2/df = 3.52, SRMR = .065, TLI = .882, CFI = .911, RMSEA = .083 (90% CI = .067, .098), and poor fit in the community sample, χ2(34) = 118.28, p = .000, χ2/df = 3.48, SRMR = .068, TLI = .859, CFI = .893, RMSEA = .077 (90% CI = .062, .093). Regression weights are reported in Table 1. The MIs revealed high error covariance between Items 1 and 3 in both samples (Student: χ2 = 82.91, parameter change = .724; Community: χ2 = 82.12, parameter change = .726). As noted by previous research (Balzarotti et al. 2010; Spaapen et al. 2014), these items are characterized by very similar wording and tap similar aspects of cognitive reappraisal.

Table 1 Confirmatory factor loadings for all items and reliability coefficients in the student (N = 371) and community (N = 415) samples

Following previous literature, Item 3 was dropped due to a lower factor loading.Footnote 1 CFAs after removing Item 3 (nine-item ERQ) generated adequate model fit for both the student, χ2(26) = 49.07, p = .004, χ2/df = 1.88, SRMR = .051, TLI = .960, CFI = .970, RMSEA = .049 (90% CI = .027, .070), and the community sample, χ2(26) = 42.84, p = .020, χ2/df = 1.65, SRMR = .052, TLI = .963, CFI = .973, RMSEA = .040 (90% CI = .016, .060). The covariance between the reappraisal and suppression factors was nonsignificant in the student sample (Φ = .07), but significant though small in the community sample (Φ = .15). Notably, however, after removing Item 3, factor loadings for Item 1 showed a .09 drop in both samples (see Table 1).

For this reason, a shortened, 8-item version − dropping both Item 1 and 3 − was tested. Compared to the 9-item version, the results showed better fit in the student sample, χ2(19) = 25.72 p = .138, χ2/df = 1.35, SRMR = .030, TLI = .985, CFI = .990, RMSEA = .031 (90% CI = .000, .059), and similar fit in the community sample, χ2(19) = 34.03, p = .019, χ2/df = 1.79, SRMR = .049, TLI = .960, CFI = .973, RMSEA = .044 (90% CI = .018, .067). The covariance between the reappraisal and suppression factors was nonsignificant in the student sample (Φ = .11), but significant though small in the community sample (Φ = .17). Regression weights are reported in Table 1. The 8-item version of the ERQ was thus used for subsequent analyses.

Multi-group confirmatory factor analysis was used to test measurement invariance in the two samples. The results are displayed in Table 2.Footnote 2 When testing metric invariance, fit indices indicated that constraining factor loadings did not significantly decrease model fit. When testing scalar invariance, the chi-square difference (Bryant and Satorra 2012) was significant (Δχ2(14) = 25.95, p < .05). However, all other fit indices indicated good model fit and changes in CFI, NCI, and RMSEA were in the range of the recommended cut-offs. For this reason, we concluded that scalar invariance was supported across the two samples. Strict invariance was further tested yielding again satisfactory fit. As strong, between-group invariance was established between the samples, the two samples were combined for subsequent analyses.

Table 2 Multi-group analyses (eight-item ERQ): invariance test (1) between student (N = 371) and community (N = 415) samples; (2) across gender and (3) age in the total sample

Invariance Across Gender and Age

Multi-group confirmatory factor analysis was used to test measurement invariance across gender in the total sample. The results are reported in Table 2. For all invariance tests, changes in the values of fit indices were in the range of the recommended cutoffs. When testing scalar invariance, the chi-square difference was significant (Δχ2(14) = 59.75, p < .001). However, since all other fit indices indicated good model fit and changes in CFI, NCI, and RMSEA were in the range of the recommended cut-offs, we concluded that scalar invariance was supported across the two groups.

Finally, the results of measurement invariance across age are reported in Table 2. For all invariance tests, changes in the values of fit indices were in the range of the recommended cutoffs. When testing scalar invariance, the chi-square difference was significant (Δχ2(14) = 27.47, p < .05). However, since all other fit indices indicated good model fit and were in the range of the recommended cut-offs, we concluded that scalar invariance was supported across the two age groups.

Latent Mean Differences

Concerning gender, latent mean comparisons yielded a significant difference both for reappraisal (CR = − 2.87; p < .05; Cohen’s d = 0.23) and suppression (CR = 5.087; p < .05; Cohen’s d = .49). Males reported less frequent use of reappraisal and more frequent use of suppression than females. Concerning age, findings of the latent mean comparisons showed that no significant difference emerged between age groups for reappraisal (CR = .82; p > .05), but older adults reported more frequent use of suppression than younger respondents (CR = 3.16; p < .05; Cohen’s d = 0.38).

Emotion Regulation and Individual’s Well-Being

Two-tailed Pearson’s correlations are shown in Table 3. Overall, reappraisal was positively associated, while suppression was negatively associated with all measures of well-being except for autonomy. Notably, the 6-item and the 4-item reappraisal scales showed similarly sized correlation coefficients.

Table 3 Two tailed pearson’s correlations between the ERQ subscales and measures of well-being

As a following step, we conducted regression analyses to test the association between ER strategies and well-being while controlling for gender and age (Table 4). Reappraisal and suppression explained a small but significant portion of variance for all the domains of well-being, except for autonomy. Reappraisal showed a small negative association with negative affect; also, it was positively related to positive affect, environmental mastery, personal growth, positive relations with others, purpose in life, self-acceptance, and subjective happiness. Suppression was positively related to the experience of negative affect and negatively related to all the other indices of well-being.

Table 4 Multiple regression analyses with the eight-item ERQ subscales predicting measures of well-being

Population-based norms for the eight-item ERQ are provided in Table 5. Norms were calculated using T scores and were computed separately for gender because it was significantly associated with both ERQ scales.

Table 5 Population based norms (T scores) for the eight item ERQ by gender (N = 786)

Discussion

Recent studies have shown inconsistencies when testing the factorial structure of the ERQ in adult community samples (e.g., Spaapen et al. 2014; Wiltink et al. 2011) compared to prior research conducted on samples of undergraduate students. These inconsistences raise issues about the use of the ERQ to measure emotion regulation strategies in non-student samples. Notably, some authors have proposed a 9-item version of the ERQ (ERQ-9), which has shown better fit than the original 10-item ERQ in community samples (Rice et al. 2018; Spaapen et al. 2014). The current study adds to this research testing the factorial structure of the original ERQ and assessing measurement invariance in an Italian community sample and a sample of undergraduate students.

First, the results showed that the original structure of 10-item ERQ (Gross and John 2003) did not reach adequate fit for the community sample; barely acceptable fit emerged in the student sample − fit indices values were similar to those obtained in the Italian validation of the questionnaire (Balzarotti et al. 2010). Consistent with other studies (Rice et al. 2018; Spaapen et al. 2014), Item 3 showed high error covariance with Item 1. These two items belong to the reappraisal scale and are thought to measure the habitual use of this strategy to feel “more positive” (Item 1) and “less negative” (Item 3) emotions respectively − when constructing the questionnaire, Gross and John (2003) included items asking about regulation of positive and negative emotions in both subscales (p. 350). Nonetheless, similar item wording might have produced strong error covariance. Because (1) the practice of allowing correlations between error terms to improve model fit is not recommended, (2) previous studies addressing the same issue have dropped Item 3 (Rice et al. 2018; Spaapen et al. 2014), and (3) Item 10 also asks about reappraisal use to regulate negative emotions, Item 3 was dropped. Removing Item 3 led to substantial improvement of model fit for both samples, providing additional support that a shortened version of the ERQ may be a cross-culturally valid instrument with good psychometric properties.

Although the results showed good fit for the ERQ-9, we decided to take one step further testing the psychometric properties of a short, 8-item scale (ERQ-8)Footnote 3 dropping both Item 1 and 3. This decision was motivated by the following reasons: (1) The factor loading for Item 1 showed a noteworthy decrease after deleting Item 3; (2) since items 7 and 10 also measure very similar facets of the construct of interest (i.e., regulation of positive and negative emotions using reappraisal), a shorter scale could eliminate redundancy existing in the reappraisal items. The reappraisal subscale of the ERQ-8 tested in this study maintained one item tapping regulation of positive emotions and one item measuring regulation of negative emotions, together with two general-emotions items. The results showed better fit for the 8-item version compared to the 9-item version of the ERQ in the student sample, and similar fit indices in the community sample. These findings provide initial evidence of validity and reliability for the 8-item Italian version of the ERQ. The ERQ-8 was thus used for subsequent analyses.

Second, multi-group measurement invariance analysis showed that the two-factor model was valid and equivalent between students and community adults. Overall, this finding does not provide support for the existence of inconsistencies in the ERQ structure between students and non-student samples. On the one side, prior studies that have failed to replicate the original ERQ factor structure have employed community samples from different countries (e.g., Spaapen et al. 2014), while in this study we compared directly student and non-student samples. On the other, barely acceptable (rather than good) model fit for the original 10-item ERQ has been found by studies employing both student (Balzarotti et al. 2010; Sala et al. 2012) and community samples (Cabello et al. 2013; Enebrink et al. 2013). Mixed evidence about the ERQ factor structure might be due to error covariances (which can vary across studies) between reappraisal items using similar item wording to ask about ER of positive and negative emotions. In the current study, good model fit and invariance across student and community samples was obtained when considering a shortened version of the ERQ.

Third, consistent with previous studies (e.g., Spaapen et al. 2014), the results showed that the 8-item ERQ model was equivalent across gender and age at a configural, metric, and scalar level. Findings from latent mean comparisons showed that men reported more frequent use of suppression than females, but less frequent use of reappraisal. Moreover, older adults reported higher suppression than younger respondents. These results are consistent with previous literature (testing differences between observed scores), which has found that males report to inhibit the expression of emotions more than females (e.g., Gross and John 2003; Melka et al. 2011; Spaapen et al. 2014; Wiltink et al. 2011). Although fewer studies have found gender differences in reappraisal use (Chen 2010; Rogier et al. 2017; Spaapen et al. 2014; Westerlund and Santtila 2018), females have been shown to report more frequent use of this strategy than males. Concerning age differences, the results of this study are consistent with previous research suggesting that suppression use may increase in older age (Brummer et al. 2014; Wiltink et al. 2011). Although this result seems to conflict with the claim that individuals develop more efficient ER abilities when age increases (Urry and Gross 2010), suppression use may be linked to less adverse affective outcomes in older age. For instance, Brummer et al. (2014) found that older adults reported more frequent use of suppression than younger and middle-aged adults; however, suppression use was significantly associated with psychological distress in younger and middle-aged adults only. Future research could further test this hypothesis.

Finally, consistent with previous research linking ER and positive psychological functioning (Gross and John 2003; Haga et al. 2009), our findings showed that cognitive reappraisal was positively associated, whereas suppression was negatively associated with individual’s self-reported well-being, and this result held across multiple indices of well-being. Notably, similar patterns of associations between ER and well-being measures were found even with the ERQ-8, providing some support for the nomological validity of this short version of the scale.

Specifically, we found that cognitive reappraisal was associated with higher levels of positive affect and lower levels of negative affect, whereas suppression showed the opposite pattern. While prior studies have most often found significant associations between habitual use of ER strategies and affective experience in undergraduate samples (e.g., Balzarotti et al. 2010; Gross and John 2003; Haga et al. 2009), the present study extends this finding to a community sample. Also, individuals who habitually use reappraisal reported higher levels of subjective happiness, whereas individuals who typically suppress reported lower levels of it (Páez et al. 2013). Finally, consistently with previous research (Gross and John 2003), both ER strategies showed significant associations with Ryff’s (1989) domains of psychological well-being; however – unexpectedly − in our results neither reappraisal nor suppression were associated with autonomy (high scores on this dimension indicate that the individual is self-determining, independent, and able regulate to his or her behavior resisting to social pressures). It is uncertain why this divergent result emerged and more research is needed to further test the relationship between habitual use of ER strategies and distinct domains of psychological well-being.

The present study adds to research investigating the psychometric properties of the ERQ. ER research has exponentially grown in recent years and the ERQ is among the most frequently used self-report instruments assessing ER strategies. Thus, the replication of psychometric studies using diverse samples (e.g., age bands, educational levels, and cultural backgrounds) is crucial to provide solid evidence about the generalizability of the instrument’s measurement properties with different populations (Melka et al. 2011). Our results provide additional support for the robustness of the two-factor structure of the ERQ and cross-cultural validity of the scale in two samples of Italian students and community-dwelling adults. A second implication of the current research concerns the testing of a short, 8-item version of the ERQ, which appeared reliable across gender and age. The ERQ-8 could thus provide an easy and brief scale to administer and may be more practical in assessment, training, and research settings when time is limited.

Conclusions

Some limitations bear noting. First, although this research extends previous results about the Italian version of the ERQ by employing a reasonably sized community sample, the two samples were collected using nonrandom sampling procedures, thus limiting the generalizability of the results to the Italian population. Nonetheless, the community sample included a broad age range and had balanced gender composition. Second, the fact that respondents received different versions of the questionnaire (the samples were recruited as part of larger studies) lead to high variability in the number of answers to the well-being questionnaires. Third, the study examined the relationship between habitual use of ER strategies and measures of positive psychological functioning. Future studies could also examine the association of reappraisal and suppression use with individuals’ reporting of distress and psychopathological symptoms. Relatedly, our study involved an undergraduate and a community sample composed of relatively well-adjusted individuals, thus results do not necessarily generalize to other samples. Future studies could employ more diverse samples, possibly including at-risk populations or clinical samples.

Despite these limitations, the current research provides a significant contribution to the existing literature about the ERQ. First, our results suggest that a shortened version of the ERQ may provide a valid, reliable measure of ER strategies across student and community samples by removing item redundancy in the reappraisal subscale. A short, 8-item version of the ERQ was tested finding good model fit and measurement invariance across gender and age. Second, the current study extends the current knowledge of the relationship between ER and positive functioning, finding significant associations between habitual use of ER strategies and multiple domains of well-being in both student and community samples. Although these results are promising, future validity studies are necessary to further test the psychometric properties of the 8-item scale.