The ability to effectively regulate emotions is a core competency for healthy functioning (Cole 2014). When emotion regulation skills are underdeveloped or otherwise compromised, normative affective development may be delayed and risk for psychopathology increases (McLaughlin et al. 2011). Broadly defined, emotion regulation includes the ability to: identify, understand, and accept emotional experiences, control impulsive behaviors when distressed, and flexibly modulate emotional responses as situationally appropriate (Cole et al. 1994; Gratz and Roemer 2004; Eisenberg and Spinrad 2004; Linehan 1993; Thompson 1994). These abilities typically increase with age (Orgeta 2009). However, difficulties with emotion regulation occur across the lifespan.

Emotion dysregulation is a core feature of disorders that span the internalizing and externalizing spectra (Beauchaine and Thayer 2015; Hofmann et al. 2012). Researchers have observed links between emotion dysregulation and self-inflicted injury (Gratz and Tull 2010; Crowell et al. 2005), identity disturbance (Kaufman et al. 2015), substance abuse (e.g., Dvorak et al. 2014), depression (Crowell, et al. 2014), conduct problems (Beauchaine et al. 2007; Cappadocia et al. 2009), attention-deficit/hyperactivity disorder (Mitchell et al. 2012), anxiety (Folk et al. 2014), post-traumatic stress (Weiss et al. 2013), borderline personality disorder (Fossati et al. 2014), and eating disorders (Lavender et al. 2014; Racine and Wildes 2013). Thus, emotion dysregulation is an excellent transdiagnostic indicator of vulnerability and may contribute to high rates of comorbidity across various diagnoses (Beauchaine and Thayer 2015).

The Difficulties in Emotion Regulation Scale

The Difficulties in Emotion Regulation Scale (DERS; Gratz and Roemer 2004) is one of the most widely used self-report measures of emotion regulation deficits. The DERS was developed to capture clinically relevant problems (i.e., those significant enough to be associated with clinical diagnosis; Gratz and Roemer 2004). However, it has also been used to examine normative developmental processes and experiences. For example, identity development, procrastination, social participation, academic motivation and performance, and psychophysiological responding are each associated with emotion regulation functioning as assessed by the DERS (Jankowski 2013; Jankowski and Rękosiewicz 2013; Mirzaei et al. 2014; Singh and Singh 2013). The measure has good reliability and validity with adolescents (Neumann et al. 2010; Sarıtaş-Atalar et al. 2015; Weinberg and Klonsky 2009) and adults of all ages (Kökönyei et al. 2014; Orgeta 2009; Staples and Mohlman 2012). It has also been validated with international samples and translated into multiple languages (e.g., Côté et al. 2013; Fossati et al. 2014; Sarıtaş-Atalar et al. 2014).

The DERS consists of 36 items that load onto six subscales (Gratz and Roemer 2004). Nonacceptance of emotional responses reflects a tendency toward negative secondary responses to negative emotions, and/or denial of distress. The difficulties engaging in goal-directed behavior scale captures problems concentrating and accomplishing tasks while experiencing negative emotions. The impulse control difficulties subscale reflects struggles to control behavior when upset. The lack of emotional awareness scale captures inattention to emotional responses. The limited access to emotion regulation strategies scale assesses beliefs that there is little a person can do to regulate one’s emotions effectively after becoming upset. Finally, the last subscale, labeled lack of emotional clarity, reflects the extent to which individuals are unclear about which emotions they are experiencing.

Although the DERS is a useful and widely-studied instrument, many of the items are conceptually similar. DERS subscales contain between five and eight statements that load strongly on to each subscale, suggesting that multiple items may not be necessary to adequately assess the underlying constructs. Furthermore, the similarity of some items may be perceived as repetitive to participants, potentially increasing frustration and fatigue. The nuances captured by each question in the DERS may be useful for research focused explicitly on emotion dysregulation. Nonetheless, a shortened version of this instrument could effectively capture core dimensions of emotion dysregulation. Meta-analytic findings indicate participant response rates are generally lower for lengthier questionnaires (Edwards, et al. 2002; Rolstad et al. 2011). In addition, brief measures can be equally or more valid than longer measures of the same construct (Smith et al. 2012). Given the broad significance of emotion regulation difficulties, measuring this construct efficiently may prove useful in epidemiological studies.

The primary goal of this study was to develop and validate a short form of the DERS (the DERS-SF). Our aim was to preserve the factor structure, achieve comparable psychometric properties and concurrent validity of the full version, and reduce survey length by 50 %. This would allow for less item redundancy, simplified scoring, shorter assessment times, and less respondent burden. In the studies presented here, we utilized data from five independent samples (presented as two CFA studies) evaluating the utility of the DERS-SF in adolescents and adults. Specifically, we used confirmatory factor analysis to examine DERS-SF items with three pooled adolescent samples (Study 1) and then sought to replicate our findings with two pooled college samples (Study 2).

We tested the following hypotheses:

  1. H1:

    It would be possible to create a psychometrically sound DERS-SF using the same 6-factor structure as the original measure. Consistent with this hypothesis, we expect the CFA for a short form of the DERS will have similar to improved model fit compared with a CFA of the original DERS. We predict items in the DERS-SF will have similar factor loadings across DERS and DERS-SF CFA’s, and expect the DERS-SF will account for a large proportion of variance in emotion dysregulation explained by the original DERS. Accordingly, we expect that correlations between DERS and DERS-SF subscales will be high (e.g., r > .90), and predict the within-scale correlations among subscales will be similar across the two versions of the measure.

  2. H2:

    The DERS-SF and the original DERS will offer comparable validity as transdiagnostic indicators of risk for psychopathology. Consistent with this hypothesis, we compute correlations of DERS and DERS-SF subscales with established measures of psychopathology, allowing us to compare concurrent validity for the two DERS forms in relation to psychopathology risk. We expect correlations between DERS-SF subscales and key psychopathology outcomes will be comparable to the correlations between DERS subscales with the same outcome measures.

  3. H3:

    The DERS-SF will perform similarly across adolescent and college samples. Consistent with this hypothesis, we conduct parallel tests of the DERS-SF in adolescent and college samples, and expect the DERS-SF to offer comparable psychometric properties and validity across developmental periods.

Method

Overview of Studies

Data were collected from five independent samples in the Pacific Northwest and Mountain West regions of the United States. Three are adolescent samples, which we combined and reported as Confirmatory Factor Analysis (CFA) Study 1. CFA Study 2 included two samples to replicate and extend our findings into adulthood. Institutional review board approval and informed assent or consent were obtained for all studies and participants.

CFA Study 1: Initial Validation of the DERS-SF Among Adolescents

Sample 1

Participants

This sample and consisted of 84 adolescent girls placed into one of three groups: self-injuring (n = 29), depressed with no self-injury history (n = 28), and typical control (n = 27). Participants were age 13–18 (M = 16.04, SD = 1.24), with 71.1 % identifying as Caucasian, 3.6 % as African American, 4.8 % as Hispanic, 6 % as Asian, 10.8 % as mixed racial/ethnic heritage, 1.2 % as other, and 2.4 % declining to answer. Participants for this sample were taken from a larger, multi-visit study that included the adolescents’ mothers. Although mothers’ self-report data were not used for the current study, their informant ratings of adolescent functioning were collected (see Measures). Participants were recruited using mailers, classified ads, banners on busses, and brochures distributed to local schools, inpatient treatment facilities, and outpatient clinics. Participants were compensated $60 for the full study.

Sample 2

Participants

Participants included 131 adolescents ages 13 to 18 (M = 15.3, SD = 1.18), recruited for an online survey. Due to incomplete survey data, 17 participants were excluded, leaving a final sample of 114. The sample was 63.2 % female, 81.6 % Caucasian, 6.1 % Asian, 6.1 % Hispanic, and 6.2 % Other (Pacific Islander, African American, or mixed racial/ethnic heritage). Participants were recruited via mailers addressed to parents/guardians that included an invitation to participate, a consent document, and an address for the survey web-link. Adolescents who were interested in participating were directed to a fully online informed assent document, where they acknowledged their assent and were assigned an individual identification code. Participants were then linked to a separate secure, de-identified survey. As compensation, adolescents were entered in a drawing for each page of the survey they completed, for a total of seven possible entries.

Sample 3

Participants

Sample 3 was taken from a family study comparing adolescent suicide attempters (n = 29) with typical controls (n = 30). Adolescents ranged in age from 12 to 20 and were 70.4 % female, 91.5 % Caucasian, 1.7 % African American, and 6.8 % of mixed racial/ethnic heritage. Six adolescents (10.2 %) identified as Hispanic or Latino, with the remaining 89.8 % identifying as non-Hispanic. Suicidal adolescents were recruited from outpatient or inpatient psychiatry at a local neuropsychiatric hospital, adolescent medicine, and online classifieds (e.g., Craigslist). Controls were recruited through flyers on community websites and classifieds, local businesses, community centers, organizations (e.g., Boy Scouts), libraries, pediatric clinics, mailings, and word of mouth. Parent reports of adolescent functioning were collected (see Measures). Participants were compensated up to $50 for the full study.

Measures

All adolescents completed the Difficulties in Emotion Regulation Scale. The DERS consists of 36 items that load onto 6 subscales. In order to assess difficulties regulating emotions during times of distress, many items begin with “When I’m upset.” Respondents are asked to indicate how often the items apply to themselves, with responses ranging from 1 to 5, where 1 = almost never, 2 = sometimes, 3 = about half the time, 4 = most of the time, and 5 = almost always. The DERS has high internal consistency (α = .93), good test-retest reliability (ρ = .88, p < .01), and adequate construct and predictive validity (Gratz and Roemer 2004). The internal reliability for the current sample was .95.

Each sample also completed a number of other outcome measures, which we used to test the concurrent validity of the DERS-SF relative to the full DERS. Selected measures assess constructs that are theoretically and empirically linked to emotion dysregulation, and scores on many of these instruments have previously shown high correlations with the scores on the DERS.

Adolescents in samples 1 and 3 provided ratings of their behavior problems, psychopathology and broadband internalizing and externalizing symptoms on the Youth Self-Report (Achenbach 1987). The YSR is 112-items rated on a three-point scale (0 = never, 1 = sometimes, 2 = often). The YSR is widely used and is well-validated, with excellent psychometric properties (Achenbach 1991a). Others have reported internal consistency reliabilities (Cronbach α) that range from .72 to .93 (Gadow et al. 2002). Estimates in the current sample ranged from a low of .76 for the social problems subscale to .89 for the anxious/depressed subscale.

Parent ratings of child behavior problems were also available for Samples 1 and 3 via the Child Behavior Checklist (CBCL; the parent-report version of the YSR, Achenbach 1991b). The CBCL is among the most widely used measures of child psychopathology. The 113 items of the CBCL were factor analyzed to empirically identify forms of adolescent psychopathology, including broad internalizing and externalizing scales. Prior research has yielded acceptable construct validity for internalizing (r = .56 to .72) and externalizing behaviors (r = .52 to .88; Achenbach 1991b), and acceptable to high internal consistency across all behavioral scales for a matched sample of referred and non-referred 12–18 year-old children (α = .68–.92; Achenbach 1991b). Test–retest reliability coefficients were also high ranging from .79 to .92 (Achenbach, Edelbrock, and Howell 1987). The internal reliability for the current sample was excellent (Cronbach’s α = .92 for internalizing and externalizing scales in sample 1; Cronbach’s α = .94 and .95 for the internalizing and externalizing scales respectively in sample 3). As stated above, emotion dysregulation is a key factor underlying psychopathology. Thus the YSR and CBCL were included to assess the concurrent validity of the DERS-SF. Previous research has demonstrated that the DERS correlates strongly with scores on these measures (see e.g., Vasilev, Crowell, Beauchaine, Mead, and Gatzke-Kopp 2009), therefore CBCL/YSR scores were also expected to correlate strongly with DERS-SF scores.

Participants from Samples 2 and 3 completed the Self-Concept and Identity Measure (SCIM; Kaufman et al. 2015). The SCIM assesses the core aspects of identity such as: self-concept and role continuity across environments and persons, consistencies in values and interests, self-worth, self-differentiation, and self-cohesion. It assesses adaptive and maladaptive identity functioning across three subscales. Internal consistency of the scale is excellent (Cronbach’s α = .89). Test-retest reliability is also excellent (α = .93, r = .88) with an intraclass correlation coefficient (ICC) of .88. Internal reliability for the total scale in our adolescent sample was acceptable at .69, and the disturbed identity, consolidated identity, and lack of identity subscales were good to excellent (α = .83, .85, and .93 respectively). Identity problems have been theoretically and empirically linked to emotion dysregulation, and previous research has demonstrated moderate to strong correlations between SCIM and original DERS scores (Kaufman et al. 2015). We anticipated DERS-SF scores would yield similar correlation patterns with this outcome measure.

Finally, self-injury history was obtained via a single item on the Lifetime Suicide Attempt Self-Injury Interview (L-SASII; formerly the Lifetime Parasuicide Count; Linehan and Comtois 1996) for samples 1 and 3. The same item was used for sample 2 but was modified slightly (i.e., revised for questionnaire-based responding rather than an interview). The specific item (“Have you ever intentionally injured yourself?”) is scored dichotomously (i.e., positive or negative for self-harm history). This item was included as an additional measure of concurrent validity given that emotion dysregulation is a key risk-factor for self-inflicted injury (see e.g., Crowell et al. 2005; Gratz and Tull 2010). The DERS-SF was expected to be positively associated with history of self-harm behavior.

Procedure

All adolescents completed questionnaires and demographic questions either during a laboratory visit (Samples 1 and 3), or on a personal computer at home (Sample 2).

CFA Study 2: Replication Among College Students

Sample 1

Participants

Participants for Sample 1 were 230 students enrolled in undergraduate psychology courses at a large university. The participants ranged in age from 19 to 64 years (M = 24.38, SD = 5.80). Approximately 63 % of the sample was female, and 88.7 % of the sample identified as Caucasian, 7.8 % Asian, .9 % African American, and .9 % as American Indian or Alaska native. Four participants did not report their race. Participants were recruited through a department participant pool and received $30 and research credit for their time. Given the large number of non-traditional students at the university, there was no upper limit placed on age.

Sample 2

Participants

Participants for Sample 2 were also students enrolled in undergraduate psychology courses at the same university. The final sample of 567 participants was 18 to 65 years of age (M = 24.20; SD = 6.21; 43 % male). Demographics were missing for 186 participants (32.8 % of the sample) due to an error in the survey. Of the participants with demographics, 79.3 % identified as Caucasian, 6.8 % as Asian, .8 % as Pacific Islander, 1.3 % as African American, 5.2 % as Hispanic, 5.8 % as multiple races, and .8 % reported other/unspecified race.

Measures

In addition to the DERS (Cronbach α = .94 for the current sample) and the SCIM (Cronbach α = .89 for Study 2 Sample 2; the SCIM was not collected in Sample 1), participants for this study also completed the Beck Depression Inventory-II (BDI-II). The BDI-II is a 21-item questionnaire used to measure depression severity (Beck et al. 1996). Because emotion dysregulation is implicated in risk for depression, we used this instrument as another important index of concurrent validity for the DERS-SF. Scores on the BDI-II have excellent psychometric properties, and have demonstrated high test-retest reliability (r = 0.93, p < 0.001; Beck et al. 1996). Internal reliability within the current sample was excellent (α = .91).

Self-injury history in Samples 1 and 2 was obtained via the Deliberate Self-Harm Inventory (DSHI), a self-report questionnaire of non-suicidal self-harm behavior used in the original DERS validation study (Gratz 2001; Gratz and Roemer 2004). For this study, participant responses were coded using 17 dichotomous items (e.g., “Have you ever intentionally [i.e., on purpose] cut your wrist, arms, or other area (s) of your body?”). Endorsement of any self-harm method was coded positive for self-harm history and declining all methods resulted in a negative code. We opted to reduce the different methods of self-harm into a single yes/no variable in order to be consistent with the approach taken in the adolescent sample. We expected that DERS-SF scores would be positively associated with reported history of self-harm behavior.

Participants completed the Symptom Checklist-90-Revised (SCL-90-R), which measures nine symptom dimensions, their severity, and provides a Global Severity Index (GSI) as a measure of overall psychopathology (Derogatis and Lazarus 1994). Previous research has demonstrated that scores on the SCL-90-R have good to excellent internal consistency (Cronbach’s α ranging from .69 to .91; Buckelew, Burk, Brownlee-Duffeck, Frank & DeGood, 1988; Derogatis and Lazarus 1994; Kaufman et al. 2015), and high test-retest reliability (Cronbach’s α ranging from .78 to .90; Derogatis & Lazarus). Cronbach’s α in the current sample ranged from acceptable to excellent (.70 for the hostility scale to .91 for the depression scale). As with the CBCL and YSR in our adolescent sample, psychopathology as measured by the SCL-90 was expected to correlate positively with DERS-SF scores.

Participants from Sample 2 also completed the Acceptance and Action Questionnaire- II (AAQ- II; Hayes et al. 2004) and the State and Trait Anxiety Inventory-Form Y (STAI-Y; Spielberger et al. 1980) as additional measures of the concurrent validity of the DERS-SF and its similarity to the original measure. The AAQ-II assesses experiential avoidance (i.e., the urge to avoid distressing thoughts, feelings, and sensations) and has sufficient test-retest reliability ranging from .79 to .81, good discriminant validity (Bond et al. 2011), and high internal reliability within the current sample (Cronbach’s α = .90). Avoidance is a frequently used maladaptive emotion regulation strategy, and this measure was used in Gratz and Roemer’s original validation study for the DERS (2004). The AAQ was expected to be positively associated with scores on this instrument. The STAI-Y is a 40-item measure of both state and trait anxiety with excellent reliability (α = .89) and test-retest reliability (ρ = .88; Barnes, Harp, and Jung 2002). The STAI-Y was included to assess concurrent validity and was expected to correlate with the DERS-SF. Internal reliability within the current sample was excellent (α = .93 for state anxiety and .92 for trait anxiety).

Procedure

Participants for both samples completed self-report measures through a secure online survey hosted by the University’s psychology department. After completing these questionnaires, participants were debriefed, compensated monetarily and/or with credit hours, and provided with a page long list of mental health resources throughout the community.

DERS-SF Development

In order to identify which items best represent the original DERS subscales, we first examined published reports by Gratz and Roemer (2004; the original validation study), Neumann, van Lier, Gratz, and Koot (2010; a replication study with adolescents ages 11–17) and Weinberg and Klonsky (2009; a replication study with adolescents ages 13–17). Both studies reported item-level exploratory factor analyses (EFA) of the DERS in their respective populations. The 6-factor structure was upheld in both samples, with acceptable subscale reliability. However, there was some variability in the strength of the factor loadings for the items within the scales for adult and adolescent samples. Thus, we were interested in identifying items that would perform well for both adult and adolescent populations in our shortened scales.

Item selection for the confirmatory factor analysis (CFA) was conducted by examining published EFA results for adolescent (Neumann, et al. 2010; Weinberg and Klonsky 2009) and college student samples (Gratz and Roemer 2004). We used three empirical approaches to identifying items that would perform well for adolescents and adults, have a strong factor loading on the primary scale, and have minimal cross-loading on other scales. Three metrics were used to provide breadth of empirical indicators for strongly loading items. First, we computed an average rank order score within each subscale. This was conducted by ranking items from the largest factor loading to the smallest for each study. We then averaged these values in the adult and adolescent studies. This approach gave preference to the “best items” in each scale. Second, we computed an average factor loading for the two studies. Third, we computed an average “discrimination score” to compare the difference between how well an item loads on its primary factor and it’s loading on “off” factors (see Donnellan, Oswald, Baird, and Lucas 2006 for another application of this approach). An average of the absolute value of “off” factor loadings is computed and subtracted from the primary factor loading. As an example, the item “When I’m upset, I feel guilty for feeling that way” loads primarily on the Nonacceptance scale. The factor loading reported by Gratz and Roemer (2004) was .91. The average of the absolute values of the loadings on the other 5 scales was .11. Thus, the discrimination score for this item was .80. Finally, when multiple items were identified as equally strong candidates using the three empirical approaches, we gave preference to breadth in topical coverage to reduce repetition of item content.

The three empirical metrics arrived at a clear consensus for the three preferred items in 4 of 6 scales: the Strategies, Impulse, Goals, and Clarity scales. In the Non-acceptance scale, the top three items were related to feeling embarrassed, ashamed, and guilty for becoming upset. To allow for a greater breadth of conceptual coverage, we opted to include “irritated” and drop “ashamed” for the CFA analyses. Finally, for the Awareness scale, both “I am attentive to my feelings” and “I pay attention to how I feel” were strong items, but deemed repetitive. Thus, “I am attentive” was excluded in favor of including “I pay attention to how I feel.”

Results

Study 1: Validation of the DERS-SF With an Adolescent Sample

Descriptive statistics for all self-report measures in Study 1 are summarized in Table 1.

Table 1 Descriptive statistics for adolescent sample

Confirmatory Factor Analysis of the DERS-SF

Confirmatory Factor Analyses (CFAs) were computed for the DERS-SF, using items that were identified in the process described above, and compared to the factor loadings of the original DERS. Analyses were conducted using Mplus 7.11 software (Muthén and Muthén 2013) using the maximum likelihood estimation (ML) procedure. ML is the most commonly used method for CFA, and provided a good fit with the data in this study since data were considered continuous, independent, and normally distributed. In addition, DERS items were rated on a 5-point scale, which exceeds the 2–3 category threshold in which WLSMV estimation become a preferred approach (Beauducel and Herzberg 2006).

In our CFA models, each item was only allowed to load on one factor (i.e., cross-loadings were not estimated to improve model fit) to ensure that factor loadings were comparable across the DERS and the DERS-SF. The DERS-SF measurement model yielded good fit with the data ( χ 2 (120) = 245.695, p < 0.01; CFI = .96; TLI = .94; RMSEA = .06 (90 %: .05–.08); SRMR = .05) and item loadings ranged from .73 to .93, all well above acceptable levels. The CFA for the DERS yielded marginal to poor fit with the data (χ 2 (579) = 1589.348, p < 0.01; CFI = .85; TLI = .83; RMSEA = .08 (90 %: .08–.09); SRMR = .08). Factor loadings for items were very similar to those found for the DERS-SF, indicating consistency in the estimation of the latent factors for the DERS-SF and original DERS. Factor loadings are reported in Table 3.

We then computed within-measure correlations among subscales for the DERS and DERS-SF (see Table 4). Subscale correlations for the DERS-SF ranged from .04 to .85 and correlations among DERS scales ranged from .14 to .90, indicating similar performance for the two scales.

Psychometric Properties and Correlations Between the DERS-SF and DERS

The psychometric properties of the DERS-SF and original DERS are reported in Table 5. Chronbach’s alpha coefficients for each of the 3-item DERS-SF subscales exceeded .70 and ranged from.79 to .91. The DERS-SF values were comparable to the original DERS.

Correlations between DERS and DERS-SF subscales were calculated to evaluate the degree to which the two versions were similar for the participants. All correlations were above .90 and ranged from .91 to .98, indicating 83–96 % shared variance between the DERS-SF and original DERS scales, despite a drastic reduction in items.

Concurrent Validity of the DERS-SF

Correlations were computed for the DERS and DERS-SF with several common outcome variables to allow for comparisons of the concurrent validity for the DERS and DERS-SF and provide information about how well the DERS-SF can be used to approximate the original version. Correlations for the DERS and DERS-SF with several outcome variables, including the CBCL, YSR, SCIM, and self-harm history are presented in Table 6. The correlations for DERS and DER-SF with indices of psychopathology were generally similar in patterns of statistical significance and in magnitude of correlation.

Study 2: Validation of the DERS-SF With a College Sample

Descriptive statistics for all self-report measures in Study 2 are summarized in Table 2.

Table 2 Descriptive statistics for college sample

Confirmatory Factor Analysis of the DERS-SF

Just as was done with the adolescent sample, CFAs were computed for the DERS-SF and DERS with the college student sample. Consistent with the results for adolescents, the DERS-SF CFA yielded good fit with the data (χ 2 (120) = 383.78, p < 0.01; CFI = .97; TLI = .96; RMSEA = .05 (90 %: .05–.06); SRMR = .04). Factor loadings for all items were large, ranging from .63 to .90. The DERS CFA yielded marginal to poor fit with the data (χ 2 (579) = 2825.61, p < 0.01; CFI = .88; TLI = .87; RMSEA = .07 (90 %: .07–.07); SRMR = .07). However, as shown in Table 3, the factor loadings for the original DERS and the DERS-SF were similar, indicating good consistency in the estimation of the latent factors.

Table 3 Confirmatory factor loadings for the DERS and DERS-SF in adolescent and college samples

Intercorrelations among subscales within the DERS and DERS-SF were computed. As in Study 1, they were similar in statistical significance and magnitude (see Table 4). Correlations among DERS-SF scales ranged from .11 to .82. Correlations among the DERS scales ranged from .15 to .90. These findings indicate similar performance for the two scales.

Table 4 Within-measure subscale correlations for the DERS and DERS-SF in adolescent and college samples

Basic Psychometric Properties and Correlations Between the DERS-SF and DERS

Table 5 reports the descriptive statistics and reliability of the DERS and DERS-SF. Consistent with the adolescent sample, the Chronbach’s alpha coefficients for the DERS-SF total scale and six subscales all exceeded .70 and ranged from .78 to .91. Correlations between the DERS and DERS-SF again indicated strong correspondence in the two versions (Table 5). These correlations ranged from .90 to .97 and indicated that the DERS and the DERS-SF shared 81–94 % of their variance.

Table 5 Correlation between DERS-SF and DERS, reliability, and descriptive statistics for adolescent and college samples

Concurrent Validity of the DERS-SF

We calculated correlations between the original DERS, the short form, and their subscales. Concurrent validity of the DERS-SF was compared to the DERS, over 6 outcomes, as measured by the BDI-II, SCL-90-R, STAI-Y, AAQ-II, SCIM, and self-harm history (see Table 7). Similar to the adolescent sample, the patterns of correlations for the DERS and DERS-SF with these outcome variables were generally consistent in statistical significance and magnitude of correlation, across all scales. These findings suggest that the DERS-SF has comparable concurrent validity to the original DERS.

Discussion

This study was designed to evaluate whether a shortened version of the widely-used Difficulties in Emotion Regulation Scale can perform similarly to the full measure (Gratz and Roemer 2004). Consistent with our first hypothesis, results from the two confirmatory factor analyses presented here indicate the DERS-SF has sound psychometric properties that are comparable to or better than the original measure. Also consistent with this hypothesis, scores on the DERS-SF effectively capture the dimensions of emotion regulation deficits measured by the original DERS. Following from our second hypothesis, we found that correlations between scores on the DERS-SF and on other clinically relevant scales mirrored correlations observed when using the full DERS. As with previous research, DERS scores correlated moderately to highly with measures of both internalizing and eternalizing psychopathology (see Tables 6 and 7). Finally, these results were consistent across both adolescent and adult samples, lending support to our third hypothesis that the DERS-SF may be useful for measuring emotion regulation deficits with participants across a wide age range. These findings supply further evidence that emotion regulation difficulties are an excellent transdiagnostic marker of psychopathology risk (Hofmann et al. 2012; Beauchaine and Thayer 2015).

Table 6 Comparisons of concurrent validity for DERS and DERS-SF for adolescent sample
Table 7 Comparisons of concurrent validity for the DERS and DERS-SF for college sample

Developing an abridged version of this widely used measure may facilitate research and enhance clinical practice aimed at targeting emotion regulation deficits. If higher reliability can be achieved from shorter tests, measurement error is reduced, statistical power enhanced, and inferences strengthened (Wilmer et al. 2012). Brief measures can maximize participant response rates and reduce response burden. Factors such as instrument length, cognitive effort required to complete a questionnaire, or survey layout and interface have been suggested to affect respondent strain (US Department of Health and Human Services, 2009). Further, lengthy questionnaires have been identified as a general obstacle in clinical practice (Mark et al. 2008) and instrument length has been used as an argument for limiting the overall number of administrations of an instrument in longitudinal studies (Rolstad et al. 2011).

Strengths of the current studies include our relatively large pooled sample sizes, the inclusion of participants from diverse settings (i.e., community, clinical, inpatient, outpatient, and two different regions of the United States), and the wide range of ages represented across the 5 samples. There are also a number of limitations to the current study. We did not use the same outcome measures across the 5 samples, making it challenging to compare participants’ scores systematically. The concurrent validity of the DERS has also been examined across a number of labs and with a broader range of outcome measures than those included in the current study. Furthermore, we lacked diagnostic interview data on study participants, which also limited our generalizability to clinical samples. Additional research will be needed to examine if the DERS-SF is a valid replacement for the original DERS when it is included in a longer battery. Although we were able to include participants from a range of developmental stages, certain groups are better represented than others. Our samples were predominantly Caucasian and most participants in our college sample were young adults. Thus, the utility of the DERS-SF needs further examination for use among those from more diverse racial/ethnic backgrounds, as well as persons in mid-to-late adulthood and adults outside of a college setting. There were also variations in the inclusion and exclusion criteria used to recruit our samples and females were overrepresented relative to males. Finally, we did not include other measures of emotion regulation difficulties that would have allowed us to examine convergent validity. Thus, although the DERS-SF performed similarly to the DERS, we do not yet know how it will compare to other gold-standard instruments of emotion regulation deficits.

In summation, the DERS-SF maintains the excellent psychometric properties and structure of the original DERS with half of the total items. This streamlined version of the instrument should be faster for participants and clients to complete and easier for researchers and clinicians to score. Given the frequent and broad used the DERS, we expect an abridged version to be useful in a range of settings, particularly those in which respondent time is limited and/or burden is high (e.g., epidemiological studies). Future studies should investigate the utility of the DERS-SF as a clinical outcome measure and examine if scores are sensitive to therapeutic change. Emotion regulation difficulties are important constructs for mental health practitioners and researchers interested in personality, psychopathology, and development. Efficient measurement of these constructs may be especially helpful for understanding the emergence and developmental course of several distinct forms of psychopathology.