Introduction

Emotion regulation is a core affective process necessary for healthy psychological functioning (Cicchetti et al. 1995). Broadly defined, emotion regulation refers to the implicit and explicit processes used to assess and modulate emotions in the service of goal pursuit (Thompson 1994). Research indicates that effective emotion regulation serves a protective function for those at risk of mental illness (Banyard et al. 2017; Ford et al. 2014). In contrast, emotion dysregulation, or difficulties in emotion regulation ability, is a transdiagnostic risk factor underlying the development and maintenance of a wide range of psychopathology, including mood disorders (Joormann and Siemer 2014), borderline personality disorder (BPD; Salsman and Linehan 2012), suicidal behavior (Neacsiu et al. 2017), self-harm behavior (Buckholdt et al. 2015), eating disorders (Brockmeyer et al. 2014), and substance abuse (Weiss et al. 2017).

As empirical work has elucidated the link between emotion regulation and psychological functioning, many researchers have developed measures to capture different components of emotion regulation such as cognitive emotion regulation (Garnefski et al. 2001; Gross and John 2003), interpersonal emotion regulation (Hofmann et al. 2016; Niven et al. 2011), and beliefs about regulatory ability (Hutchison and Gunthert 2013). One of the most widely used measures is the Difficulties in Emotion Regulation Scale (DERS; Gratz and Roemer 2004). The DERS was developed based upon a leading clinical conceptualization of emotion regulation, which emphasizes the communication and signaling function that emotions serve (Gratz and Roemer 2004; Linehan 1993). This model also prioritizes the need for flexible responses to a wide range of emotional experiences for optimum mental health. To this end, the DERS is a 36-item self-report instrument consisting of six subscales measuring difficulties in the flexible, multi-dimensional regulation of emotion, including: 1) non-acceptance of or negative reactions to emotions (nonacceptance subscale); 2) difficulties engaging in goal-oriented behavior when experiencing negative emotions (goals subscale); 3) difficulty controlling impulsive behavior when experiencing negative emotions (impulse subscale); 4) lack of emotional awareness (awareness subscale); 5) perceived inability to cope with negative emotions (strategies subscale); and 6) lack of clarity about one’s emotions (clarity subscale).

DERS items were initially generated in collaboration with colleagues familiar with the emotion regulation literature (Gratz and Roemer 2004). The instrument was initially tested in two studies using undergraduate samples to examine factor structure, internal and test-retest reliability, and construct and predictive validity (Gratz and Roemer 2004). The DERS has been well-tested and there is extensive empirical support for its high internal consistency, reliability, and construct validity in undergraduate samples (Gratz and Roemer 2004; Gratz and Tull 2010), adolescents (Neumann et al. 2010), and both outpatient (Osborne et al. 2017) and inpatient (Fowler et al. 2014) clinical samples. Scores on the DERS have been associated with a range of mental health symptoms and clinically relevant behaviors (e.g., Buckholdt et al. 2015; Fowler et al. 2014; Osborne et al. 2017). Additionally, this measure has helped further research elucidating models of the development and maintenance of psychopathology (e.g., Crowell et al. 2012; Sharp et al. 2016). The DERS is also sensitive to change over time, making it ideal for use in clinical research studies that require multiple assessment points, and is frequently used in treatment outcome research (Wilks et al. 2016). However, the length of this measure may place undue burden on researchers and participants and reduce its validity, as response quality often declines with increasing duration of data collection (Galesic and Bosnjak 2009). A briefer version of the instrument with shorter administration time could increase the measure’s usefulness in clinical and research settings.

For this reason, three independent efforts were recently made to shorten the original 36-item DERS (DERS-36; Gratz and Roemer 2004), all of which were found in separate analyses to retain the excellent psychometrics of the DERS-36 (Bjureberg et al. 2016; Kaufman et al. 2016; Victor and Klonsky 2016). All three scales, the DERS-16 (Bjureberg et al. 2016), DERS-SF (Kaufman et al. 2016), and DERS-18 (Victor and Klonsky 2016), were derived using different, psychometrically sound methods based largely on the initial factor analysis used in the development of the DERS-36. Two versions (DERS-SF and DERS-18) retained subscales and one (DERS-16) did not. Construct validity for each short form was established using measures of psychopathology and clinically relevant problem behaviors believed to serve a regulatory function (i.e., self-harm, substance use, purging). Initial development of each short form is thoroughly summarized in Online Resource 1.

All three short forms have demonstrated good reliability and validity across a range of samples. However, as each has produced a slightly different brief version of the DERS (see Table 1 for item comparison), it is currently unclear which brief form should be used in clinical and research settings. The proliferation of multiple assessments to measure the same construct can lead to problems. There is evidence that even small differences between measures can lead to problems comparing effect sizes across studies (Carlson and Herdman 2012). Thus, the use of multiple brief versions of the DERS could create a scenario in which much of the literature in this area cannot be directly compared. This is particularly concerning in the case of emotion regulation research, which is already plagued by multiple discrepant definitions and assessment tools, making it difficult to collapse findings across studies (Bloch et al. 2010). Further, the availability of multiple highly similar assessments can result in confusion and inefficiencies in the research process, requiring each researcher to dedicate extra effort towards comparing the different versions of the measure before selecting one for their purposes. Thus, the use of different brief versions of the DERS is likely to foster confusion, inconsistency, and inefficiency in the emotion regulation literature.

Table 1 Items contained within three brief DERS

Therefore, the current study aimed to investigate the psychometrics of the three brief versions of the DERS in the same sample to clarify the relative strength of each version in order to identify the short form best suited for future use in emotion regulation research. First, we examined the reliability of all three short forms. Second, we evaluated and compared the concurrent validity of the three brief DERS using measures of clinically relevant outcomes, including symptoms of BPD, depression, anxiety, and clinically relevant problem behaviors. Last, we explored the concurrent validity of the subscale scores for the two short forms that retained them in order to determine the utility in using a brief DERS with subscales.

Method

Participants and Procedures

Participants were undergraduate students at a university in the western United States recruited for a larger study examining emotions, the self, and relationships. In order to be eligible, participants had to be: (1) 18 years or older, (2) a native English speaker, and (3) eligible to earn research credit in a psychology classroom. Following informed consent, participants completed an online battery of self-report measures. A total of 1360 participants began the survey. Of these, only those who completed all survey items and failed none of the two embedded attention checks were retained, resulting in a total sample of 1181. Participants were largely female (n = 836, 70.79%) with a mean age of 20.7 (SD = 5.15). The ethnic breakdown of the sample is as follows: Caucasian (n = 675, 57.15%), Multi-ethnic (n = 191, 16.17%), Hispanic/Latino (n = 143, 12.11%), Asian (n = 90, 7.62%), African American (n = 49, 4.15%), Native American (n = 13, 1.10%), Pacific Islander (n = 6, 0.51%), and Other (n = 14, 1.19%). The university institutional review board approved all procedures.

Measures

Demographics Questionnaire

A questionnaire was administered assessing common demographic variables such as age, gender, and ethnicity.

Emotion Regulation

Participants completed the DERS-36. Each of the three short forms of the DERS were computed from responses to the original DERS; participants did not complete short forms separately or respond to any DERS items twice, in order to avoid undue participant burden or fatigue. Items contained within each brief DERS are displayed in Table 1. The first short form, the DERS-16 (Bjureberg et al. 2016), was developed using highest item-total correlations from the original DERS-36 study, with items removed based on content validity judgment by the DERS-36 scale developer. The DERS-16 retained no items from the awareness subscale. The second short form, the DERS-SF (Kaufman et al. 2016), retained items based on confirmatory factor analysis of multiple published exploratory factor analysis (EFA) results, using three metrics developed to select items with the strongest factor loading on the primary scale and minimal cross-loadings across EFAs. The DERS-SF consists of 18 items, three of which load onto each of the original six subscales. The third short form, the DERS-18 (Victor and Klonsky 2016), was developed by selecting the three highest loading items on each of the original six subscales using factor loadings from the original DERS-36 study, thus retaining 18 items. A more detailed summary of the initial development of each short form is available in Online Resource 1. Cronbach’s α for all versions of the DERS and their corresponding subscales are presented in Table 2.

Table 2 Internal consistency of all DERS scales

Psychopathology

Several aspects of psychopathology shown to be associated with emotion regulation difficulties were assessed to examine the concurrent validity of the three brief DERS. Measures were selected in order to capture the same constructs as validity measures used in initial development of all three brief DERS (Bjureberg et al. 2016; Kaufman et al. 2016; Victor and Klonsky 2016).

Borderline Personality Disorder Symptoms

Features of BPD were measured in the current sample using the short version of the Borderline Symptom List (BSL-23; Bohus et al. 2009). The BSL-23 measures a range of clinical symptoms associated with BPD over the course of the past week on a 0 (“not at all”) to 4 (“very strong”) Likert scale. This measure demonstrates good reliability and ability to discriminate BPD from less severe forms of psychopathology (Bohus et al. 2009), has been used to measure BPD in non-clinical samples with good reliability and validity (Salsman and Linehan 2012), and shows strong convergent validity with other measures of BPD (Glenn et al. 2009). In the current sample, Cronbach’s α for this measure was 0.95.

Depression and Anxiety

For a subset of the sample (n = 576), the 21-item version of the Depression Anxiety Stress Scales (DASS-21; Henry and Crawford 2005) was administered. This measure captures depression, anxiety, and stress experienced in the past week on a Likert scale ranging from 0 (“never”) to 4 (“most of the time”). In the current study, only the more pathological subscales of depression and anxiety were examined to reduce the number of comparisons. The DASS-21 has demonstrated good reliability, construct, and discriminant validity in a non-clinical sample (Henry and Crawford 2005) and was used in the development of the DERS-16 (Bjureberg et al. 2016). In the current study, Cronbach’s α was 0.91 for depression and 0.80 for anxiety.

Clinically-Relevant Behaviors

For the full sample, several problem behaviors associated with deficits in emotion regulation were assessed using the behavioral supplement of the BSL-23. The 11-item supplement measures the frequency of occurrence of a variety of impulsive and harmful behaviors over the past week on a 0 (“not at all”) to 4 (“daily or more often”) Likert scale. Dichotomous variables were created for items capturing suicidality (e.g., “I told other people that I was going to kill myself”), self-harm (e.g., “I hurt myself by cutting, burning, strangling, headbanging, etc.”), disordered eating (e.g., “I had episodes of binge eating”), and substance use (e.g., “I got drunk”) such that any frequency of the behavior (rating of 1–4) was coded as positive for the behavior and zero ratings of the item corresponding to the behavior was coded as negative. The BSL supplement was used in the development of the DERS-16 (Bjureberg et al. 2016).

Data Analytic Plan

First, Cronbach’s alpha evaluated the internal consistency of all three short form total scores, as well as the original DERS-36. Next, four sets of regression analyses (12 total) examined the concurrent validity of each short form and the original DERS-36 total score on continuous variables of BPD, depression and anxiety symptoms, and four sets of logistic regression analyses (16 total) examined the concurrent validity of each short form and the original DERS-36 total score on dichotomous variables of suicidality, self-harm, disordered eating, and substance use. We did not include any covariates in the models. Analyses were conducted separately for each brief version of the DERS, due to highly overlapping items comprising each scale, leading to problematic multicollinearity if each were included within the same model (rs > .95, ps < .001). Bonferroni corrections at p < .002 were used to adjust for multiple comparisons for these analyses. We conducted post-hoc tests for differences between measures in the strength of relationships between DERS total scales and outcome measures with Dunn and Clark’s (1969) z test using the cocor package in R (Diedenhofen and Musch 2015).

Next, the internal consistency and concurrent validity of each DERS subscale was examined for the two short form measures that retained subscales (DERS-SF and DERS-18). To evaluate concurrent validity, we conducted three sets of multiple regression analyses (nine total multiple regressions) for the two short forms and original DERS-36 on continuous variables of BPD, depression, and anxiety, and three sets of multiple logistic regression analyses (12 total multiple logistic regressions) for the two short forms and original DERS-36 on dichotomous variables of suicidality, self-harm, disordered eating, and substance use. Given the large number of comparisons between DERS original and short form subscales, we plotted these results using the arm package in R (Gelman & Su 2016) for ease of interpretation. Bonferroni corrections at p < .0004 were used to adjust for multiple comparisons for these analyses.

Results

Reliability

Table 2 summarizes Cronbach’s alpha analyses. The total scores of all three short forms of the DERS demonstrated strong internal consistency reliability, comparable to each other and the original DERS-36. Similarly, internal consistencies for the DERS-SF and DERS-18 subscales were high and comparable to the original DERS-36 subscales. Additionally, consistent with previous research on the DERS-36, the awareness subscale of both short forms had the lowest reliability.

Concurrent Validity

Total Scores

Regression analyses evaluating concurrent validity of the DERS-16, DERS-SF, and DERS-18 total scores on psychopathology and clinically-relevant behaviors are summarized in Table 3. Similar to the original DERS-36, all three short form versions of the DERS demonstrated strong associations with BPD symptoms, depression, anxiety, suicidality, self-harm, and disordered eating (ps < .001). Substance use was not significantly associated with any of the DERS forms after adjusting for multiple comparisons using Bonferroni corrections (ps = .005 to .01). Results from the z test indicated that total scores across all three brief versions demonstrated similar concurrent validity (ps > .05). However, there was some evidence that short forms were more strongly associated with BPD and depression symptoms than the original DERS-36.

Table 3 Multiple linear and binary logistic regressions assessing the predictive validity of the DERS-36 and multiple short form total scores on all outcomes

Subscale Scores

Regression analyses evaluating concurrent validity of the DERS-SF and DERS-18 subscales, compared to the DERS-36 subscales, on psychopathology are summarized in Fig. 1. All three measures demonstrated a high degree of concordance in their pattern of association with clinical variables. Comparable to the original DERS-36 subscales, subscales on both the DERS-SF and DERS-18 demonstrated strong associations with BPD symptoms, depression, and anxiety with similar patterns of significance in individual subscales. The strategies and clarity subscales were associated with BPD symptoms across the DERS-36, DERS-SF, and DERS-18 (ps < .0001). However, the nonacceptance subscale was also significantly associated with BPD symptoms on the DERS-SF and DERS-18 (ps < .0001). The strategies and clarity subscales of the DERS-36 were associated with depression (ps < .0001). Similarly, the strategies and clarity subscales were associated with depression for both short forms (ps < .0001); however awareness, for both short forms, was also associated with depressive symptoms (ps < .0003). The pattern of subscales associated with anxiety was identical across all measures: the strategies and clarity subscales of the DERS-36, DERS-18, and DERS-SF were associated with anxiety (ps < .0002).

Fig. 1
figure 1

Plots of multiple linear regression unstandardized coefficients

Binary logistic regression analyses evaluating concurrent validity of the DERS-SF and DERS-18 subscales on clinically relevant behaviors are summarized in Fig. 2. Few significant associations were detected between any of the DERS subscale scores and binary outcomes; however, the DERS-SF and DERS-18 subscales demonstrated nearly identical patterns of concurrent validity. Although all versions of the DERS demonstrated a similar pattern of effects, only the goals subscale of the DERS-18 was significantly associated with suicidality (p = .0002), and only the clarity subscale of the DERS-36 was associated with disordered eating (p = .0003) after correcting for multiple comparisons.

Fig. 2
figure 2

Plots of multiple logistic regression unstandardized coefficients

Discussion

The current study evaluated three new brief versions of the widely-used DERS for both internal consistency reliability and concurrent validity with relevant clinical outcomes to guide measure selection for future research. Total scores for all three short forms demonstrated similar reliability and concurrent validity for BPD, depression and anxiety symptoms, and a range of clinically relevant problem behaviors. There was slight evidence that the short forms were more strongly associated with BPD and depression than the DERS-36, lending further support for the utility of employing a brief version. Taken together, the results suggest that all three brief versions performed equally well and are highly suited for the evaluation of emotion regulation difficulties in research settings, especially those in which brevity is ideal due to time constraints.

The results of this study do not clearly answer the question of which brief form of the DERS is the best to use in clinical and research settings. All three versions were well-developed and tested to ensure sound psychometric properties (Bjureberg et al. 2016; Kaufman et al. 2016; Victor and Klonsky 2016; Online Resource 1) and the current study supported these initial findings in an independent sample. Therefore, one interesting conclusion from this study is that different psychometrically sound and comparably performing brief instruments can be derived from the same measure using discrepant methods. Unfortunately, the development of three separate, but equally reliable and valid short versions of the DERS is likely to result in measurement inconsistency across studies, limiting the ability to compare or combine results across different research groups and potentially slowing the research process. This specific example reflects a larger concern related to the proliferation of similar measures within psychology generally, and within the field of emotion regulation specifically. The precise manner of defining and measuring emotion regulation, like many psychological constructs, has been the source of much debate (e.g., Cole et al. 2004; Gross and Barrett 2011; Seligowski and Orcutt 2015). Therefore, different research groups have designed varying measures to try to most effectively capture the construct; however, discrepant research findings across studies can be difficult to interpret when different assessment approaches are used (e.g., Naragon-Gainey et al. 2017). The ability to compare findings across studies is crucial to the advancement of an empirical science; therefore, lack of consistency and redundancy across multiple measures of the same construct makes clarification of research findings more difficult (Carlson and Herdman 2012; Bloch et al. 2010). Alliance and collaboration among emotion regulation researchers in measure development might help the field move forward with clearer construct conceptualizations and more comparable research methods. Along these lines, we would strongly recommend that experts within the emotion regulation field select one short version of the DERS that is believed to be best suited to briefly assess emotion regulation and proceed with utilizing this measure consistently across studies. Although it is certainly difficult to derive expert consensus on the recommended measurement of a construct, recent examples exist within psychology in which this has been done effectively (Bolte et al. 2018; Howes et al. 2018).

Although our findings did not clearly identify a brief version of the DERS with the best psychometrics, they did provide some clues as to which of these measures might provide optimal utility. Subscale analyses lent support for the utility of selecting a short form that retains the ability to examine specific facets of emotion regulation. As the DERS-36 has demonstrated in several research studies (e.g., Buckholdt et al. 2015; Mennin et al. 2009; Weinbach et al. 2017), specific subscales of the DERS-SF and the DERS-18 were associated with different clinical problems. For example, although the strategies and clarity subscales were associated with a number of different clinical outcomes across measures, the awareness subscale was especially associated with depression, and the nonacceptance subscale with BPD symptoms for both the DERS-18 and DERS-SF. As highlighted in this example, as compared to the DERS-36, the DERS-SF and DERS-18 sometimes offered a more specific subscale profile for different clinical issues (e.g., BPD symptoms, depression, suicidality). While this pattern certainly warrants replication, and research testing the predictive validity of the short forms is needed, it does offer increased utility of the measures. In addition, gold standard practices for developing short form versions of assessment measures mandate that all facets captured by the original measure be preserved to the degree possible (Smith et al. 2000). This increased level of specificity and content overlap between short and long forms is useful for better understanding the core emotion regulation mechanisms underlying problem behaviors, may support the assessment of certain clinical problems based on specific emotion regulation difficulties, and would inform more effective intervention and efficient treatment targets for specific clinical problems.

Interestingly, although the utility of the awareness subscale has been questioned due to concerns about construct validity and previous analyses suggesting that it is less related to the overall DERS score (Bardeen et al. 2012), the DERS-16—which removed all items from the awareness subscale—did not result in better concurrent validity compared to brief measures which retained items from the awareness subscales. Moreover, the awareness subscale of both the DERS-SF and the DERS-18 was significantly associated with depression, lending support for the retention of that subscale. Although the DERS-16 developers omitted the awareness subscale for theoretically-based reasons and in response to lower reliability scores across studies, this elimination may result in a less valid measure that does not maintain all important constructs captured with the DERS-36 (Smith et al. 2000). In sum, these findings suggest that the DERS-SF and DERS-18 may have some incremental advantage over the DERS-16. Although all the brief scales are newly developed, and none appear to be widely used as of yet, the DERS-SF has been in use longer and cited more often than the DERS-18, which suggests that there could be utility in continuing to use this measure going forward in order to compare results between the greatest number of studies. However, further work is needed investigating which DERS subscales might be most useful for research on emotion regulation. Furthermore, findings from the current study suggest that factor analysis of the DERS-16 to identify subscales could increase its utility.

The strengths of the current study include the use of a large sample, which successfully provides further validation of the three brief DERS scales by replicating initial reliability and concurrent validity analyses. Limitations include the use of cross-sectional self-report in a non-clinical, undergraduate sample which is largely female and Caucasian. Further testing in a more diverse sample is warranted. In addition, while measures were selected to capture the same constructs used in initial validation of all three brief DERS, the current study’s measures overlapped with some (i.e., DASS-21, BSL) but not all measures used in the original studies (e.g., DSHI, BEST, BDI-II), which may have impacted the results (Bjureberg et al. 2016; Kaufman et al. 2016; Victor and Klonsky 2016). Moreover, in the current study and in all the initial brief DERS developmental studies but one (Study 1; Bjureberg et al. 2016), the short versions are derived from the DERS-36. Further research should examine potential differences in response styles if the short versions are administered alone. Given the nature of emotion regulation and the strong association between DERS scores and clinical variables, further testing should also be done in a clinical sample. The current study also did not evaluate the test-retest reliability of the short forms. Given that briefer versions are ideal when repeated measurement is needed, future studies should examine their utility in this domain. Similarly, in clinical settings, it is important to consider measurement precision for shortened measures (Kruyen et al. 2013). Future studies should examine how these three measures compare on measurement precision. Additionally, future longitudinal studies are warranted to establish the long-term psychometrics, including predictive validity, of these measures.

Conclusions

This study is the first to provide additional support for the psychometrics and potential utility of all three existing brief DERS. Although the results of the current study did not clearly identify one brief version of the DERS as superior to the others, they did offer some clarification regarding potential important differences between multiple short forms of this measure. In particular, our results provide support for the retention and use of subscales of the DERS to increase the utility of the instrument in association with specific clinical outcomes. We hope that unified efforts to clarify the definition and measurement of emotion regulation will continue in order to enhance the validity, consistency, and comparability of the assessment of this important construct.