Early adulthood (i.e., ages 18–25) is a period of high vulnerability for health-risk behaviors, given the changing roles, demands, and development (Ko et al., 2023; Sethi, 2023). The capacity to regulate emotions usually improves with age (Martin & Ochsner, 2016), although negative environmental influences, such as stress, traumatic experiences during childhood, or even substance use, may contribute to dysfunctional emotional regulation styles (Duprey et al., 2023). Therefore, having validated assessment measures available for ED will be valuable for prevention and treatment for such behaviors.

Emotional dysregulation (ED) is a transdiagnostic variableFootnote 1 pertaining to difficulties in identifying ones’ own emotions, modulating the emotional response and/or implementing suitable strategies to regulate emotions (Gratz & Roemer, 2004; Judah et al., 2022). Theoretically, ED combines various abilities (higher order processes, such as distress intolerance) and strategies (behaviors that modulate negative emotions, e.g., awareness) which are interconnected (Tull & Aldao, 2015). From a clinical standpoint, ED underlies a range of mental health disorders (Haag et al., 2023; Weiss et al., 2022), including maladaptive behaviors such as unprotected sex (Tull et al., 2012), risky driving (Alberto et al., 2014), and substance use (Gori et al., 2023; Weiss et al., 2022).

The field of addiction research has focused on looking at the association between ED and addictive behaviors (Weiss et al., 2022), with most studies suggesting a bidirectional association, that is, ED leads to addictive behaviors involving substance and non-substance use, and addictive behaviors lead to ED (Weiss et al., 2017). Difficulties in regulating emotions adds risk to substance use onset and addiction through shared vulnerabilities, including self-medication (Wolitzky-Taylor et al., 2021). At its core, substance use can serve as a way of regulating negative emotions, which may increase the risk of substance use disorders (Stellern et al., 2023). Aside from this, one plausible mechanism linking ED and substance use is impulsivity (Schreiber et al., 2012), given that both frequency and severity of substance use appear to increase with high scores in the ED facet specifically involving difficulties controlling impulsive behaviors (Garke et al., 2021). This is why ED is now the focus of a burgeoning area of research in clinical psychology and specifically the field of addiction (Stellern et al., 2023; Weiss et al., 2022).

Several questionnaires are available for evaluating ED, such as the Emotional Regulation Questionnaire – ERQ (Gross & John, 2003), the Cognitive Emotional Regulation Questionnaire – CERQ (Garnefski & Kraaij, 2006), and the Emotion Dysregulation Scale – EDS (Powers et al., 2015). One of the most widely used scales is the Difficulties in Emotion Regulation Scale – DERS-36 (Gratz & Roemer, 2004), which has six subscales (non-acceptance of one’s own emotions, limited access to ER strategies, lack of impulse control, lack of clarity, lack of emotional awareness and difficulties engaging in goal-directed behaviors).

The DERS-36 has been validated in several countries (Cancian et al., 2019; Miguel et al., 2017), and various shorter versions have been developed (e.g., the DERS-6/5), which are useful as they allow data collection in situations where time is short. In Spain, Hervás and Jódar (2008) provided evidence for the internal reliability and validity of the 28-item version of the DERS using a community sample of adults. Interestingly, the Spanish validation demonstrated adequate psychometric properties and revealed that five dimensions—rather than the six originally reported in the study by Gratz and Roemer (2004)—had a better fit to the data. In particular, ‘difficulties in controlling impulses’ and ‘limited access to regulation strategies’ comprised a single factor called ‘lack of emotional control’. Nonetheless, research on the psychometric properties of ED for substance users is lacking, and given the increasing research on ED in Spanish-speaking samples with addictive behaviors (e.g., Mestre-Bach et al., 2023; Peris-Baquero et al., 2023) a thorough psychometric assessment is needed for this specific population.

The principal aim of the current study was to substantiate the construct validity of the DERS by examining its factor structure and its sex invariance; and the DERS evidence base in relation to substance-use related problems, anxiety, stress, and depression. We hypothesized that: (1) the DERS-28 would be valid and reliable in a Spanish-speaking sample of young adults; (2) the study would identify significant associations between depression, anxiety, stress, substance use and ED, thereby enriching our understanding of the DERS within the broader context of emotional regulation. Additionally, this study will leverage Item Response Theory (IRT) and Classical Test Theory (CTT) methodologies to examine the DERS’ precision in measuring different levels of the ED latent trait as well as its capacity to differentiate between individuals with different levels of ED. University settings are increasingly offering services for mental health prevention (including substance use) and choosing reliable, valid measures of ED is crucial for screening and intervention.

Method

Participants and procedure

The sample comprised 1,676 university students who self-reported past-month substance use (alcohol, tobacco, cannabis, heroin, cocaine, ecstasy, amphetamine, hallucinogens). Participants were recruited from three Spanish universities (in Asturias, Saragossa, and the Balearic Islands) and several vocational training centers in Palma de Mallorca (Balearic Islands).

The mean age of the sample was 19.56 (SD = 1.70, range 18–25) and 64.6% were women. Two-fifths (39.4%) were in their first year of university, 45.8% were in their second year, 3.6% in their third year, and 3.4% in their final year. The remaining participants (7.8%) were doing vocational training. The mean amount of money available for weekly personal expenses was €60.79 (SD = 119.61), with a median of €30 and interquartile range [IQR] 30.

Most of the students (80.31%: 1,346/1,676) were not working at the time they completed the survey, while 17.54% (294/1,676) were working part-time, and 2.15% (36/1,676) were working full-time. The vast majority of the sample (88.72%: 1,487/1,676) reported past-month alcohol use, followed by tobacco (43.79%: 734/1,676), cannabis (18.97%: 318/1,676) and other illicit drugs (8.59%: 144/1,676).

Permission to apply the assessments was obtained prior to data collection from both coordinators and teachers at the universities. To collect the data, evaluators presented the project to several classes. Participants completed an online survey (https://metajovenes.es) using tablets or their own mobile devices. Participants were able to complete the survey in person or online. The decision to participate in-person or online did not depend on participants and was influenced by external factors (class accessibility). The online option was only offered to coordinators and teachers who did not allow access to their classes in order to avoid disruption. During in-person assessments, a research assistant provided some background and a brief introduction to the study objectives. Several attentional control checks (see measures section) were included to ensure the authenticity of responses. Participant privacy was protected through pseudonymization. There was no specific time limit set for the survey, and it took students around 45 min to complete. Raffle tickets for a €100 voucher were given to participants to encourage participation.

Measures

Sociodemographic characteristics

Participants provided information on sex (male, female), date of birth, course year, money available for weekly personal expenses, and employment status (i.e., no employment, part/full-time).

Attentional control

To control for random responses, lack of understanding, or inattention to the task, the survey included four questions that were randomly distributed within the assessment battery (e.g., “for this item, please choose half of the time”). Participants were required to answer 3 of the 4 items correctly, and no data were excluded based on this criterion.

Substance use

Substance use (i.e., alcohol, tobacco, cannabis, and other illicit drugs) was assessed by means of a self-report. Specifically, participants answered the following question: “Have you used any of the following substances in the past year/month?”. Participants indicated usage of substances including alcohol, tobacco, cannabis, heroin, cocaine, ecstasy, amphetamines, and hallucinogens.

Skip patterns were incorporated to streamline the questionnaire, with only substance users completing additional measures. Participants who reported not using any substances were immediately directed to the psychological measures. Past-year alcohol users completed The Brief Young Adult Alcohol Consequences Questionnaire (B-YAACQ; Pilatti et al., 2014), a tool used to assess problems derived from alcohol use. The B-YAACQ has excellent internal consistency (α = 0.97). It has a unidimensional structure and contains 24 dichotomized (i.e., yes/no) items that assess past-year alcohol problems. Higher total scores indicate higher alcohol use problems.

Past-month tobacco users completed the Heaviness of Smoking Index (HSI; Pérez-Ríos et al., 2009), a multiple choice two-item measure (i.e., time to first cigarette upon waking and number of cigarettes per day), rated 0–3 each. The HSI assesses the severity of nicotine dependence, it is psychometrically sound and is accurate in identifying those at risk of meeting diagnostic criteria for nicotine dependence. The cut-off for nicotine dependence is 4 (Sujal et al., 2021).

Past-month cannabis users completed the Cannabis Use Identification Test-revised (CUDIT-R; Adamson et al., 2010). The CUDIT-R is used to assess cannabis misuse and derived problems. The CUDIT-R has 8 items (items 1–7 use a 5-point Likert scale from 0 to 4, while item 8 has three answer options, scoring 0, 2 or 4). A threshold of 8 is an indicator of potentially hazardous use (Adamson et al., 2010).

Difficulties in Emotion Regulation Scale-28 (DERS-28)

The Spanish DERS-28 ( Hervás & Jódar, 2008) assesses difficulties in the awareness, understanding, or modulation of emotion. It consists of 28 items with five subscales: non-acceptance of emotional responses ‘When I’m upset, I feel like I am weak’ (NOA; 6 items); Interference ‘When I’m upset, I have difficulty controlling my behavior’ (INTE; 5 items); Lack of emotional control ‘When I’m upset, I believe I will remain that way for a long time’, (LC: 6 items); Inattention ‘I care about what I am feeling’ (INA; 6 items); and Confusion ‘I pay attention to how I feel’ (CON; 5 items). Participants responded on a Likert scale from 1 (almost never) to 5 (almost always). Higher scores reflect greater ED. The DERS-28 has demonstrated adequate reliability in Spanish community adults (α = 0.93) and validity in relation to other associated variables such as emotional intelligence, depression, and anxiety (Hervás & Jódar, 2008).

Depression, Anxiety and Stress Scale (DASS-21)

The DASS-21 (Bados et al., 2005) is a self-reported measure that assesses symptoms of depression, anxiety, and stress (7 items per subscale). Participants rate each item on a 4-point Likert-type scale (from 0 = “did not apply to me at all;” to 3 = “applied to me most of the time”). Scores for depression, anxiety, and stress can be calculated by adding together the items from each dimension and multiplying by two (Lovibond & Lovibond, 2005). Additionally, a total score can be computed by adding all the DASS-21 dimensions together, with higher scores indicating greater severity of depression, anxiety, and stress symptoms. The DASS-21 has demonstrated good reliability (α = 0.73-0.81) (Fonseca-Pedrero et al., 2010) along with adequate convergent and discriminant validity (Bados et al., 2005).

Data analysis

Descriptive statistics for DERS items

First, descriptive statistics were calculated for the DERS items (i.e., mean, standard deviation, skewness, and kurtosis) along with discrimination indices (i.e., corrected item-test correlation).

Factorial structure testing

The factorial structure of the DERS was tested using two sub-samples from the total (N = 1,676). A subsample of 565 participants was used for exploratory factor analysis (EFA), with the remaining 1,111 used for confirmatory factor analysis (CFA). In the EFA, KMO and Bartlett’s tests were used to study sampling adequacy for factor analysis. The EFA was run on the Pearson correlation matrix, using Unweighted Least Squares (ULS) as the estimation method (Ferrando & Lorenzo-Seva, 2017). Promin rotation was implemented in the EFA due to the relationship between the DERS dimensions. The indices of fit used were the Goodness of Fit Index (GFI), and the Standardized Root Mean Square Residual (SRMSR). GFI > 0.95 and SRMSR < 0.06 values were considered to be good fit (Hu & Bentler, 1999). The CFA was based on the Pearson correlation matrix, using ULS as the estimation method. Due to the correlations between the different dimensions, five models were tested: 1) a one-factor model; 2) a five-uncorrelated factors model; 3) five-correlated factors model; 4) a second-order model; and 5) a bifactor model with one general factor and five specific factors. The bifactor model and the second-order model are preferable to a unidimensional model when the dimensions in a questionnaire are intercorrelated, which suggests a higher hierarchical factor or general factor (Rodriguez et al., 2016). Bifactor and second-order factor models are two well-known approaches for general constructs comprising several related domains, such as the DERS (Xu et al., 2021). The Comparative Fit Index (CFI), Non-Normed Fit Index (NNFI), Root Mean Square of Error Approximation (RMSEA), and SRMSR were used as fit indices, which are adequate when CFI and NNFI > 0.95, RMSEA < 0.08 and SRMSR < 0.06 (Hu & Bentler, 1999). Following standard practice (Ferrando et al., 2022; Lane et al., 2016), we selected the best fitting model according to the parsimony indices: the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC). Lower values for both indices indicate more parsimonious models (Burnham & Anderson, 2004).

Bifactor model analysis

Indices related to the bifactor model were computed to test the extent of the general factor’s influence on participants responses to items compared to specific factors. Omega Hierarchical (ωH) gives information on the amount of total variance attributed to the general factor and specific factors once the effect of the general factor is eliminated (McDonald, 1999). Values of ωH > 0.70 for the general factor, and ωH > 0.30 for specific factors support an essentially unidimensional latent structure, but also sufficient importance for the specific factors (Reise et al., 2013). Explained Common Variance (ECV) provides information about common variance explained by the general factor—the rest of the variance being explained by specific factors. Factors with general ECV higher than 0.60 can be considered as representing a latent variable (Reise et al., 2013). Finally, coefficient H was calculated to examine whether the assessed model would be suitable and replicable through different studies, and factor determinacy (FD) was analyzed to examine the extent to which individual differences in factor score estimates can be used to assume that they are good representations of true individual differences in the factor. Possible values range from 0 to 1, with values closer to 1 indicating better determinacy. Recommended values are H > 0.80 and FD > 0.90 (Rodriguez et al., 2016).

Measurement invariance and sex differences

Given the extensive literature reporting sex differences in patterns of substance use and ED (Weiss et al., 2022), we assessed measurement invariance based on sex. Configural (i.e., the same items belong to the same factors across groups), metric (i.e., factor loadings set as equal across groups), and scalar (i.e., levels of item responses are equal across groups) invariance were calculated through multi-group confirmatory factor analysis (MG-CFA). Given that nested models were used, changes in CFI and RMSEA less than -0.01 and 0.015, respectively (Chen, 2007) were established for accepting measurement invariance.

Reliability assessment

The reliability of the scores in the different dimensions of the DERS and the total score were examined using Classical Test Theory (McDonald's ω coefficient) and Item Response Theory (information functions). Based on Samejima’s graduated model (Samejima, 1969), the a parameter of item discrimination (i.e., how well the items discriminate between different levels of the latent trait) was calculated. Values higher than 0.64 and higher than 1.7 indicate high and very high discrimination, respectively (Baker, 1985). In addition, b parameters (threshold or location parameter; the point along the latent trait scale at which the participants have a 0.5 probability of responding in or above a category threshold) were calculated for each of the 28 items. Finally, information functions (i.e., the precision of the DERS scale at different levels of ED) were also calculated.

Validity evidence

To assess the validity evidence base of the DERS-28 in relation to other variables, Pearson's correlations were calculated between the DERS-28, the B-YAACQ (alcohol problems), the CUDIT-R (hazardous cannabis use), the HSI (nicotine dependence severity), and the DASS-21. Cohen’s benchmarks (1988) were followed to interpret the effect sizes. Specifically, r ≤ 0.49 was considered small, r = 0.50–0.79 was considered moderate, and r > 0.80, was considered large.

Software

Descriptive statistics, discrimination indices, and Pearson’s correlations were calculated using SPSS 24 (IBM Corp, 2016). The EFA was calculated with FACTOR. The CFAs, CFA-Multi Group (CFA-MG) and the reliability coefficients were calculated using R and the Lavaan package. IRT analysis used IRTPro software.

Results

Validity evidence based on internal structure

Univariate normality of the DERS-28 indicated no excessive deviations from normality, with skewness and kurtosis values reaching ± 1 in most of the items (see Table 1). The discrimination indices (corrected item-test correlation) of the items were very high, both in the general dimension and analyzed by specific dimensions (Table 1).

Table 1 Descriptive statistics and discrimination indices of the DERS-28 items

Both KMO (0.936) and Bartlett's statistics (p < 0.001) indicated a suitable fit for the EFA. The EFA showed that the original five-dimension model (Hervás & Jodar, 2008) had an adequate fit to the data (GFI = 0.996; RMSR = 0.026), explaining 68.74% of the variance. The factor loadings of each of the items are shown in Table 2. Correlations between the different factors ranged between 0.143 and 0.644 (see Table 2).

Table 2 Factor loadings in the exploratory factor analysis and correlations between the different dimensions

Once oblique rotation was used, all items loaded as in the Spanish version of the DERS-28 (Hervás & Jódar, 2008), except for items 11, 12 and 15. Items 11 and 12 loaded on the INTE factor rather than in the LC factor, and item 15 did not load on any of the extracted factors.

The different confirmatory factor models tested (one-factor model, five-uncorrelated factors model, five-correlated factors model, second-order model and bifactor model) are shown in Table 3. According to all the indicators, the bifactor model (see Fig. 1) demonstrated the best fit to the data. Figure 1 shows the standardized factorial loadings for each of the items, both for the general factor (ED) and its corresponding specific factor (e.g., INT). Total ED scores (i.e., the general factor) reflected variation on a single latent variable (general factor) and subscale scores reflected reliable variance that is independent from the general factor. The bifactor indices supported an essentially unidimensional latent structure (ωHG = 0.808; ECVG = 0.554; H = 0.951; FD = 0.966), and significant variance explained by the specific factors (ωHS = 0.347—0.558; ECVS = 0.067—0.123; H = 0.649—0.857; FD = 0.867—0.934), except for LC (ωHS = 0.112; ECVS = 0.072).

Table 3 Fit of the different models for the DERS-28
Fig. 1
figure 1

Bifactor factorial structure of DERS-28 with standardized factor loadings. Note. ED = emotional dysregulation; INTE = interference; NOA = non-acceptance; LC = lack of control; INA = inattention; CON = confusion.

Once the bifactor model was selected as the most appropriate factorial structure, measurement invariance based on sex was examined. As Table 4 shows, measurement invariance was satisfied at the configural (structural invariance), metric (equivalence of item loadings on factors), and scalar levels (same intercepts on the observed variables across groups).

Table 4 Measurement Invariance of DERS-28 based on sex

Reliability and Precision

Reliability (ω) via Classical Test Theory was excellent in each dimension (ωconfusion = 0.825; ωinattention = 0.849; ωlack of control = 0.928; ωnon-acceptance = 0.933; ωinterference = 0.896; ωED (total score) = 0.957). Using the IRT approach, a and b parameters for each item were analyzed. As Table S1 (supplemental material) shows, item discrimination was very high, with the a parameter for most items above 2, ranging between 1.14 and 4.39. This shows that all items were highly discriminative, adequately differentiating people who scored high or low on the latent trait. Furthermore, each item’s b parameters were adequate and scaled in the expected order, going from smaller to larger. This indicates that the thresholds of each of the items followed the expected pattern, that is, the higher the response alternative, the higher the value of its b parameter. This is said to follow the expected pattern because the higher the value of b for an item alternative, the greater the ability required to select that alternative. In short, all items exhibited adequate parameters in terms of IRT, both at the level of discrimination (parameter a) and at the level of trait adherence (parameter b). Information Functions (see Fig. 2) showed generally adequate precision between ability levels of -2.5 to + 1.5. In other words, the specific dimensions and the general factor exhibited a standard error below 0.5 at medium, medium–high and medium–low levels of the variable, losing precision at extreme levels.

Fig. 2
figure 2

The Information Functions of the Different Dimensions

Figure 2 shows the precision of the measurement in each of the specific factors along the ability continuum (level in each of the traits). In general terms, the highest precision was found at medium and high levels of the trait, with precision being lost at extreme low levels. This implies that the scale is very precise in measuring medium and high levels but is less accurate measuring those who score very low on the trait (e.g., who show very low levels of LC).

These IRT results are in line with the results from CTT. For instance, the elevated discrimination indices identified in CTT closely mirror the high a parameters identified in IRT. Likewise, the elevated reliability coefficients from CTT correspond to the Information Functions in IRT, although the latter offer a more precise measurement of the trait level.

Validity evidence based on relationships with emotional variables

Table S2 (supplementary material) shows Pearson correlations between the DERS-28, substance use related measures, and the DASS-21. All of the DERS-28 dimensions were positively associated with depression, anxiety, and stress (Table S2), meaning higher ED was associated with worse mental health. Overall, statistically significant correlations were small to moderate in size (values of |r| ranged between 0.036-0.645). There were moderate associations between depression and CON, LC, NOA, and INTE, between anxiety, stress, and LC and NOA. There were small associations between depression, anxiety, stress, and INA, between anxiety, stress, CON and INTE.

Statistically significant relationships between B-YAACQ, HSI, CUDIT-R and ED had small effect sizes, with larger effects for the relationships between B-YAACQ and LC, and INTE. Notably, of the five dimensions, LC demonstrated the strongest associations with all substance use (values of |r| ranged between 0.066-0.228) and emotional variables (values of |r| ranged between 0.610-0.645). Additionally, the dimension demonstrating the weakest correlations with these three variables was INA. Of the substance use related variables, alcohol problems (B-YAACQ) were statistically significantly related to all ED dimensions and the total score of the DERS-28. HSI was statistically significantly related to LC, NOA, and INTE only. CUDIT-R was statistically significantly associated with LC and the total score of the DERS-28.

Discussion

This study aimed to be the first validation of the Difficulties in Emotion Regulation Scale (DERS-28) in a large sample of young Spanish adults who had reported past-month substance use. The DERS-28 exhibited optimal internal reliability and adequate validity in relation to substance use measures, stress, anxiety, and depression. The bifactor model, with one general factor (Emotional Dysregulation: ED) and five specific dimensions, had the best fit to the data, and this internal structure was sex-invariant. The discriminating power of test scores (i.e., the ED construct) and specific dimensions for differentiating between distinct levels of the ED trait was high. Of the five DERS dimensions, LC exhibited a stronger association with substance use measures, anxiety, depression, and stress symptomatology.

From an exploratory perspective, the DERS-28 structure was best described as comprising five related factors. Both the number of dimensions and factor loadings were consistent with the validated version from Hervás and Jódar (2008) in the general Spanish population. Those authors found a similar factorial solution to the original DERS-36 version by Gratz and Roemer (2004). The exception was that the LC factor included items coming from two scales of the Gratz and Roemer (2004) original version: “Difficulty in controlling impulses” and “limited access to regulation strategies”.

The factorial loadings differed slightly from the Hervás and Jódar (2008) findings, items 11 ‘when I am upset, I feel embarrassed about feeling that way’ and 12 ‘when I am upset, I have difficulties accomplishing my work’ loaded to a greater extent on the INTE dimension (rather than the LC factor), while item 15 ‘when I am upset I believe I will end up feeling very depressed’ did not load on any of the five observed factors. The factorial structure found in the present study may be partially accounted for by the characteristics of the study sample; the sample was younger (Mage = 19.56) than the sample in the Hervás and Jódar (2008) study (Mage = 38.9). Differences between the present findings and those from previous studies may be due to mental health symptomatology not reaching clinical levels. It is also possible that the word ‘depression’ in item 15 has different connotations for younger vs. older samples. Younger and older samples describe their mental health differently, which could explain why young people did not self-identify with the word "depression." For instance, young populations mention issues related to their main activities (e.g., “work”, “school”, “relationships”), whereas the older population use words focused more on feelings and body states (i.e., “crying”, “insomnia”) (Sikström et al., 2023).

From the confirmatory approach, the bifactor model (which comprised a general factor -ED- and five specific factors) was the best fit to the data, which has also been reported in validation studies for other emotional regulation questionnaires (Xu et al., 2021). Bifactor indices supported a general ED factor, but with a part of the variance also explained by specific factors. The exception was LC (lack of emotional control), which suggests that it is the most important dimension for explaining ED. LC comprises items related to difficulties in controlling one’s own behavior and is intertwined with impulsive behaviors (e.g., lack of premeditation, which refers to acting with low consideration of potential consequences, positive and negative urgencies which denote increased impulsivity under positive and negative emotions, respectively) (Wallace et al., 2021). LC also correlated more strongly to substance use related variables, stress, anxiety, and depression severity. These findings are in line with research showing that substance use weakens the prefrontal cortex, which has an important role in regulating emotions (Goldstein & Volkow, 2011). In consequence, people who engage in more severe patterns of use struggle with distressing situations. According to learning theories (Moos, 2007), substance users learn that using substances can relieve negative effects, increasing cravings if they are faced with difficult situations or emotions (Darharaj et al., 2023). Additionally this finding maps well with the consequences stemming from young people’s difficulties in managing emotions. The negative impacts of ED can occur during demanding tasks, such as studying or working. In fact, in young populations ED appears to contribute to poor academic performance (Usán Supervía & Quílez Robres, 2021). Collectively, our findings suggest ED may be particularly likely to increase vulnerability to both emotional and substance use disorders and is therefore a potential intervention target to be considered in prevention and treatment.

The DERS-28 was invariant with regard to sex at three levels: configural, metric and scalar. This supports the idea that DERS reflects the same construct for men and women, and that the scores it gives have the same meaning for everyone that is evaluated. This allows us to make valid comparisons and interpret ED differences across sex confidently.

According to both CTT and IRT, reliability was optimal for both the total DERS-28 score and for the scores from each of the five dimensions. All of the dimensions exhibited greater reliability than the Spanish version of the DERS-28 (Hervás & Jódar, 2008). Notably, the DERS-28 exhibited adequate precision for measuring the latent ED trait at medium, medium–high and medium–low levels of the variable, but slightly lower (albeit acceptable) precision at extreme levels of ability (very low or high). This suggests that the DERS-28 may work well with community samples, but caution should be exercised with clinical profiles, for whom alternative screening procedures are advised (e.g., the Emotion Dysregulation Scale, short version [EDS-short; Powers et al., 2015]).

The study findings and their generalizability should be interpreted in the context of several limitations. First, the study is constrained by the fact that the sample was made up of young adults (i.e., 18–25). Relatedly, given that participants were recruited from the community, the results cannot ensure how far the reliability and validity of the DERS-28 also applies to clinical samples, such as young adults receiving substance use disorder treatment. Finally, emotional variables were assessed by means of a self-report and including a diagnostic measure (e.g., The Compositive International Diagnostic Interview [CIDI]) might have been useful to look at clinically relevant cut-offs in order to detect those in need of further risk assessment and intervention.The incremental and discriminant validities of the DERS in comparison to other measures assessing the same construct also remain unclear, as participants were not asked to complete such questionnaires. Lastly, as this was not a clinical study, the predictive validity of the DERS-28 was not examined in relation to prevention and treatment outcomes, such as substance use and psychiatric symptoms. Future research is warranted with clinical study samples, including treatment-seeking young adults and people undergoing substance use treatment. Substance use treatments framed within a transdiagnostic approach have yielded promising results (Sloan et al., 2018), and further research is warranted regarding the utility of the DERS-28 for identifying young adults most likely to respond to integrated interventions.

Conclusion

Our study supports the use of the DERS-28 in young community adults who use legal and illegal substances. The internal reliability of the DERS-28 was supported regardless of sex, and there was evidence of validity in relation to nicotine dependence severity, alcohol-related problems, cannabis disorder risk, and emotional variables (stress, depression, and anxiety symptom severity). The scale is relatively brief and can be administered in approximately five minutes, which has benefits for screening and referral to prevention or treatment services. The fact that evidence of validity of the DERS-28 was tested against substance use screening measures (i.e., HSI, BYAAC-Q, CUDIT-R) suggests that it could be included in national epidemiological surveys, and in evaluation and monitoring of prevention programs.