Introduction

According to recent global estimates, 615 million people are suffering from depression and/or anxiety, which imposes a high burden on both the affected individuals (e.g., poor function at work or school) and society as a whole (e.g., medical costs) [1]. Numerous self-reported instruments have been developed for the early screening or assessment of people with common mental health problems, of which the Depression Anxiety Stress Scales (DASS)-21 is widely used, relatively short, and freely available in the public domain [2].

The DASS-21 is a short version of the DASS-42 [3] that was developed with the initial aim of measuring negative emotional symptoms of depression and anxiety. During the development process, a third construct corresponding to irritability, tension, and agitation emerged empirically, and was labeled as “stress.” Therefore, the DASS comprises Depression, Anxiety, and Stress subscales, each of which has 14 items [4]. Antony et al. [3] selected seven items from each subscale of the original DASS, and demonstrated the reliability and validity of the DASS-21.

During the last 2 decades, the measurement properties of the original English version of the DASS-21 have been evaluated in both clinical and non-clinical populations [3, 5,6,7,8]. The DASS-21 has also been translated into 44 languages (www2.psy.unsw.edu.au/dass/), with its measurement properties studied in various countries, but concerns have emerged about discordant results. For example, its structural validity has variously been reported as having a three-factor, second-order three-factor, bifactor, two-factor, and one-factor structure [5, 9, 10].

Despite the heterogeneity of these findings, we are not aware of any systematic review of the DASS-21. The aim of this study was therefore to systematically review the measurement properties of the DASS-21, by applying the recently updated COnsensus-based Standards for selection of health Measurement INstruments (COSMIN) methodology [11,12,13].

Methods

Data sources and literature search strategy

The MEDLINE, Embase, and CINAHL databases were searched from their inception up to January 19, 2018. The search strategy consisted of three groups of search terms: name of instrument, type of instrument, and measurement properties. The search terms utilized to identify the name of the instrument (DASS-21) were [(“depression” AND “anxiety” AND “stress”) OR “depression anxiety stress scales” OR “DASS”]. The search for the type of measurement instrument utilized a modified Patient-Reported Outcome Measures (PROMs) filter developed by the Patient-Reported Outcomes Measurement Group at the University of Oxford (http://phi.uhce.ox.ac.uk). The search terms for measurement properties utilized a validated high-sensitivity search filter developed by Terwee et al. [14].

Eligibility criteria

Studies of the measurement properties of the DASS-21 and reported on in full-text articles in English were included. DASS-21 studies that involved healthy general patients, patients with chronic disease, or patients with psychiatric disorders were all eligible since the instrument was developed without limiting the population of interest. Studies of the DASS-21 involving populations younger than 14 years were not eligible because there are too few data available to confirm the validity of the scale in this age range [15]. Studies in which the DASS-21 had been used in validation tests of other instruments were excluded. Intervention studies in which the DASS-21 was used as an outcome measure were also excluded because no hypotheses about responsiveness had been evaluated.

Selection of studies

The selection process and the included studies are presented in Fig. 1. Duplicates were removed using EndNote, and initial screenings were conducted to remove irrelevant studies based on the title and abstract of the identified studies. The eligibility of the studies was assessed through full-text reviews. The studies were selected by two reviewers (J.L. and S.H.M.) independently. Any disagreements about inclusion were resolved by consensus with a third reviewer (E.-H.L.).

Fig. 1
figure 1

Flow diagram of the systematic review according to PRISMA

Data extraction

Data were extracted about the population in each study, such as the sample size, age, gender, and target population; on the setting, country, and language where the DASS-21 was administered; and on the results obtained for the measurement properties.

Assessing the risk of bias

The methodological risk of bias in the measurement properties of the included studies was assessed using the newly developed COSMIN Risk of Bias checklist [11, 13]. The changes in the updated COSMIN Risk of Bias checklist include the removal of standards on missing data and handling, sample size, and translation process [11]. The risk of bias in the measurement properties for each study was rated on the same 4-point scale, and determined by taking the lowest rating of any items within each measurement property.

Evaluation of measurement properties for each result

The results for the content validity of each study were rated using five criteria for relevance, one for comprehensiveness, and four for comprehensibility. The results for other measurement properties of each individual study were rated using the updated criteria for good measurement properties as “sufficient (+)”, “insufficient (−)”, or “indeterminate (?)” [12, 16]. Additional criteria were utilized in the present study because the updated criteria did not include the results of exploratory factor analysis (EFA) for structural validity (+; at least 50% of the variance explained by the factors), or Pearson’s correlation coefficients (+; r ≥ 0.80) for reliability.

For the rating of hypothesis testing for construct validity (convergent validity and known-groups validity), the reviewers decided a priori to apply the well-known Beck Anxiety Inventory (BAI) [17], Back Depression Inventory (BDI) [18], Hospital Anxiety and Depression Scale (HADS) [19], and Positive and Negative Affect Schedule (PANAS) [20] as comparator instruments for convergent validity. For convergent validity, r was expected to be >0.50 for the correlations with the comparator instrument if it measured a similar construct to the DASS-21. Construct validity was rated as sufficient (+) if at least 75% of the results were in accordance with the hypotheses, insufficient (−) if at least 75% of the results were not, and indeterminate (?) if no hypotheses were defined.

Summary of the evidence and grading of the quality of evidence

For content validity, all results were qualitatively summarized into the following overall ratings for the relevance, comprehensiveness, and comprehensibility of the DASS-21: “sufficient (+),” “insufficient (−),” or “inconsistent (±)” [13]. The results of all studies for each measurement property (except content validity) were qualitatively summarized or quantitatively pooled and summarized as “sufficient (+),” “insufficient (−),” “inconsistent (±),” or “indeterminate (?) [12]. Explanations for inconsistent results were explored using conducting subgroup analyses. For the qualitative summary, the results of studies for measurement properties were summarized, such as by providing the range of values or the percentage of supported hypotheses for construct validity [11]. Quantitative pooling was conducted to perform a meta-analysis for estimating the convergent validity (Pearson correlation coefficients) for hypothesis testing. The R statistical analysis program (version 3.4.3) and the metafor package were utilized [21]. The estimated coefficient values, 95% confidence intervals, and Higgin’s I2 were calculated. The random-effects model was selected considering the heterogeneity of the studies in terms of the diversity of patient samples and various language versions.

The quality of evidence for each measurement property was graded as high, moderate, low, or very low using a modified version of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [12] while taking into account the risk of bias (methodological quality of the studies), inconsistency of results across studies, imprecision (total sample size of the included studies), and indirectness (evidence from different populations). Indirectness was not applicable to the present study because the DASS-21 was developed without a specific target population or context of use.

If there existed a single study for each measurement property of the DASS-21, the summary and overall rating were not assessed in order to avoid overweighting by that single study. Two authors (E.-H.L. and J.L.) independently performed the above processes from data extraction to grading the quality of evidence, and all three authors convened to produce the final consensus.

Results

DASS-21 studies identified

The database search identified 7085 articles. After removing duplicates, 5540 articles were screened based on their titles and abstracts to remove irrelevant articles. Forty articles then remained, of which five were excluded after full-text screening while six additional articles were identified, resulting in 41 articles [3, 5,6,7,8,9,10, 22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55] on the measurement properties of the DASS-21. Seven articles each described two studies that examined different structures of the DASS-21. Each study of measurement properties was considered as a separate study. This systematic review included 41 articles that contained reports on 48 studies (Fig. 1).

General characteristics of the articles

Table 1 presents the characteristics of the included articles. The original English version of the DASS-21 was evaluated in 23 articles, with 13 articles from Australia [6, 7, 10, 26, 29, 33, 38, 41, 47, 49, 50, 52, 53] and eight articles from the USA [8, 22, 23, 27, 30, 42, 46, 51]. The most frequently evaluated non-English versions were Malaysian [24, 28, 34, 36, 44, 55] and Portuguese [9, 32, 37, 45]. Most of the studies (n = 18) included a healthy general population. Data were collected in non-clinic settings (n = 20), clinic settings (n = 15), or both clinic and non-clinic settings (n = 6).

Table 1 Characteristics of the included articles

Synthesized evidence

The overall ratings of the evidence for each measurement property of the DASS-21 and the quality of evidence for this scale are described below and presented in Table 2. Note that none of the included articles reported on measurement error, and so this was excluded.

Table 2 Summary of findings

Content validity

The most frequently evaluated component related to content validity was comprehensibility as evaluated by patients [22, 24, 25, 35, 37, 43, 45]. Two studies asked professionals about the comprehensiveness of the DASS-21 [39, 43], while none of studies asked either patient or professionals about the relevance of the DASS-21. There was sufficient high-quality evidence for the comprehensiveness of the DASS-21, sufficient moderate-quality evidence for comprehensibility, while there was sufficient but very-low-quality evidence for its relevance. Overall there was sufficient moderate-quality evidence for the content validity of the DASS-21 [2, 22, 24, 25, 35, 37, 39, 43, 45].

Structural validity

In total, 45 studies from 37 articles assessed the structural validity of the DASS-21 and found several types of factor structures: three factors, as for the original DASS-21 study [3], bifactor, and one factor. Other types of factor structures such as second-order three-factor [22], two-factor [45], and four-factor [50] structures were demonstrated in single studies.

A three-factor structure of the DASS-21 was reported for 29 studies. Twenty studies (68.9%) had at least an “adequate” COSMIN methodological quality rating. Ratings lower than “adequate” were due to small samples [22], methodological flaws (orthogonal rotation [31, 37, 52, 55] or unclear estimation method [44]), reporting the structural validity of a modified DASS-21(18 items [34] or 12 items/9 items [44]), or demonstrating different item loadings compared to the original DASS-21 [35].

The structural validity of the studies that supported a three-factor structure with the same seven items for each subscale was summarized with COSMIN ratings of at least “adequate” quality [3, 5, 7,8,9,10, 23, 25, 27, 29, 38, 40, 41, 47, 48]. Among the studies that supported a three-factor structure with at least an “adequate” quality, those having issues of different item loadings [40, 49, 51, 53] or modified structures [7] were excluded from the qualitative summary. Three-factor structural validity was evaluated utilizing EFA (n = 1), confirmatory factor analysis (CFA) (n = 13), or the Rasch model (n = 1). Twelve of 15 studies (80%) exhibited a “sufficient” rating, which is above the criterion value of 75% [11], and so the overall rating for the summarized result was rated as sufficient (+); however, the quality of evidence was rated as moderate because of inconsistencies in the result ratings.

Eight studies evaluated bifactor structural validity [5, 30, 32, 39, 41,42,43, 54]. All of these studies had a very good methodological quality, with the quality rated as sufficient with a high quality of evidence.

Three studies (described in two articles) found one-factor structural validity [31, 46]. The results of two of the studies were methodologically of low quality, being inadequate and doubtful, and so their results were not summarized, and no grade was given to the associated evidence.

Internal consistency

Internal consistency of the DASS-21 was well supported. In the studies involving a three-factor structure, the subscale values of Cronbach’s alpha/uncorrelated error [27] and the Pearson separation index [7] were overall > 0.70 except for the DASS-21 Anxiety subscale [10, 27, 29]. Under the bifactor structure, Cronbach’s alpha [5, 32, 39, 41, 43, 54] and coefficient omega [30, 42] for the DASS-21 subscales and the total scale were all > 0.70.

The Cronbach’s alpha values for the three-factor structure with at least adequate methodological quality [3, 5, 8,9,10, 22, 23, 25, 29, 40, 43, 48] were qualitatively summarized. Two studies [7, 27] were excluded from the summary because it evaluated uncorrelated errors (rho), or Pearson Separation Index (PSI) as a statistical value. The qualitatively summarized coefficient alpha values for the three-factor DASS-21 Depression, Anxiety, and Stress structure were 0.83–0.94, 0.66–0.87, and 0.79–0.91, respectively; and the overall rating had sufficient moderate-quality evidence. Cronbach’s alpha values for the bifactor structure were qualitatively summarized [5, 32, 39, 41, 43, 54], and two studies [30, 42] were summarized separately because coefficient omega values were used.

The qualitatively summarized coefficient alpha values for the three-factor DASS-21 Depression, Anxiety, and Stress structure were 0.83–0.94, 0.66–0.87, and 0.79–0.91, respectively; and the overall rating had sufficient moderate-quality evidence. Cronbach’s alpha values for the bifactor DASS-21 structure were 0.90–0.95 (total scale), 0.82–0.92 (Depression), 0.74–0.88 (Anxiety), and 0.76–0.90 (Stress); the corresponding qualitatively summarized coefficient omega values (two studies) were 0.89–0.97, 0.86–0.99, 0.82–0.99, and 0.85–0.99, respectively. The overall rating was sufficient and of high quality for the internal consistency under the bifactor structure.

Cross-cultural validity/measurement invariance

Six studies assessed cross-cultural validity/measurement invariance [10, 23, 34, 38, 48, 54]. Five of these studies had assessed the cross-cultural validity/measurement invariance based on a three-factor structure, and the sixth study [54] demonstrated a bifactor structure. The quality ratings of the five studies were inconsistent and no explanation was found, and so subgroups by gender, race, country (language), and disease status were explored in an attempt to understand the inconsistency. Subgroup analysis by gender [38, 48] yielded inconsistent moderate-quality evidence regarding measurement invariance. The other three subgroups included only a single study: race [23], country (language) [34], and disease status [10].

Reliability

Reliability was reported for five studies [25, 26, 39, 40, 47]. Only one study [25] evaluated the intraclass correlation coefficient (ICC) for reliability, while the results were insufficient for the remaining studies. The insufficient results might have been due to a problem with the research method involving long time intervals between the first and second administrations of the DASS-21. Therefore, three studies [25, 26, 39] were qualitatively summarized after eliminating two studies with intervals of 3–6 months [40, 47]. Pearson’s correlation coefficients for the two studies were 0.75–0.78 (Depression), 0.64–0.73 (Anxiety), and 0.64–0.65 (Stress). The overall ratings for the Depression, Anxiety, and Stress subscales were insufficient low quality because of a serious risk of bias and serious inconsistency.

Criterion validity

Criterion validity was reported for three studies [27, 31, 33]. The psychiatrist-administered Structured Clinical Interview for DSM-IV Axis 1 Diagnoses (SCID) for depression and anxiety was utilized as the gold-standard criterion for the DASS-21. The DASS-21 Depression and Anxiety subscales demonstrated areas under the receiver operating characteristic curves (AUCs) of 0.77–0.91 for SCID Depression and 0.60–0.83 for SCID Anxiety. Therefore, high-quality evidence of sufficient criterion validity was exhibited for the DASS-21 Depression subscale, and moderate-quality evidence of insufficient criterion validity was exhibited for the DASS-21 Anxiety subscale.

Hypotheses testing for construct validity

Quantitative pooling was applied to the correlations of the DASS-21 Depression subscale with the BDI [3, 22, 25, 27, 39], the HADS Depression subscale [26, 33, 45], and the PANAS Negative Affect subscale [5, 9, 27]; of the DASS-21 Anxiety subscale with the BAI [3, 22, 27, 39], the HADS Anxiety subscale [26, 33, 45], and PANAS Negative Affect subscale [5, 9, 27]; and of the DASS-21 Stress subscale with the PANAS Negative Affect subscale [5, 9, 27] (Table 3; Supplement 1 contains forest plots). Construct validity was supported by high pooled coefficients for the correlations of the DASS-21 Depression with the BDI (r = 0.73), the HADS Depression subscale (r = 0.69), and the PANAS Negative Affect subscale (r = 0.56). The DASS-21 Anxiety subscale demonstrated high pooled coefficients for the correlations with the BAI (r = 0.75), the HADS Anxiety subscale (r = 0.66), and PANAS Negative Affect subscale (r = 0.55). DASS-21 Stress was also strongly correlated with the PANAS Negative Affect subscale (r = 0.66). Based on these findings, the overall construct validity was rated as sufficient and the quality of evidence as high in the hypotheses testing.

Table 3 Pooled correlation coefficients for construct validity (convergent validity)

Five studies [3, 6, 25, 27, 40] evaluated known-groups validity. All known-groups comparisons were conducted while including patients with a psychiatric diagnosis. All results (five out of five) regarding DASS-21 Depression and Anxiety, and 80% of the results (four out of five) regarding DASS-21 Stress were in accordance with the hypotheses supporting known-groups validity. The overall ratings of known-groups validity for the DASS-21 were sufficient high quality.

Responsiveness

Two studies [6, 26] analyzed responsiveness, comparing the DASS-21 scores of patients at admission/predischarge and at discharge. Both demonstrated significant changes in the DASS-21 Depression and Stress scores at discharge, and the results were in accordance with the hypotheses for the Depression and Stress subscales (sufficient rating of low quality because there is a serious risk of bias when using paired t-tests to analyze responsiveness, making this an inappropriate method for evaluating responsiveness). The direction of the Depression and Stress change in two studies [6, 26] was opposite: decreased in psychiatric patients [6] whereas increased among patients with traumatic brain injury [26]. The DASS-21 Anxiety subscale exhibited an inconsistent rating of very low quality because of multiple inadequate studies with inconsistent results which had utilized the paired t-test.

Discussion

This systematic review evaluated 48 studies of the measurement properties of the DASS-21 as reported in 41 articles. Content validity refers to whether the content of an instrument appropriately reflects the construct to be measured, which is the most important measurement property of an instrument [13]. For example, Ailliet et al. [56] noted that the content validity of the Neck Disability Index is poor due to it missing important content, and so they advocated developing a new instrument. With regards to the content validity, the DASS-21 demonstrated sufficient evidence for relevance, comprehensiveness, and comprehensibility. The quality of evidence was high for comprehensiveness, moderate for comprehensibility, and very low for relevance. The presence of sufficient high-quality evidence for comprehensiveness suggests that the DASS-21 includes key concepts. Comprehensibility refers to whether the PROM instructions, items, and response options were understood by the population of interest as intended and also to the wording of the items and whether the response options matched the questions. The lack of qualitative methods for assessing the comprehensibility of the DASS-21 resulted in sufficient moderate-quality evidence. Relevance refers to the relevance of items for the construct, target population, and context of use of interest, response options, and recall period, and these aspects were not evaluated either by experts or patients in any of the content validity studies of the DASS-21. Further studies are therefore strongly recommended to evaluate the content validity of the DASS-21, especially its relevance.

Most debate regarding the psychometric properties of the DASS-21 has revolved around its underlying structure. The DASS-21 was originally demonstrated with the three factors of its Depression, Anxiety, and Stress subscales; however, alternative structures have been explored due to substantial interfactor correlations ranging from moderate to strong [41]. When interfactor correlations are r > 0.4, a bifactor model in which items load on both a general (unidimensional) factor and group factors (potential subscales) may be viable [57]. The existence of a common factor was also supported in the DASS developmental process [2]. The second-order CFA identified a common factor that accounted for 83, 75, and 84% of the variance in the Depression, Anxiety, and Stress subscales. Consistent with this, the best structure derived in the present systematic review was a bifactor structure that exhibited a sufficient high quality of evidence. That is, the DASS-21 items load on a general factor named as a Negative Emotional state (accounting for the common variance among all 21 items) as well as orthogonal group factors named as Depression, Anxiety, and Stress subscales (explaining the item covariance that is independent of the covariance due to the general factor). Osman et al. [30] reported that the item variance of the DASS-21 was explained more by the general factor (62%) than by any of the group factors. These findings have the practical implications that both the total and subscale scores should be calculated separately and considered independently with weightings relative to the total score. The DASS-21 has the merit of providing general information about the negative emotional status of patients as well as each emotional symptom of depression, anxiety, and stress. Establishing cut-points would improve practicality of using the DASS-21.

According to the COSMIN Risk of Bias checklist [11], evidence for structural validity is a prerequisite for the internal consistency and cross-cultural validity/measurement invariance, and these measurement properties focus on relationships between the items constituting an instrument. The present study found that the bifactor structure was optimal for the DASS-21 since this was associated with sufficient high-quality evidence for internal consistency. However, evidence for the bifactor structure measurement invariance could not be assessed due to the availability of only a single study [54]. It is therefore recommended that future studies evaluate the bifactor structure invariance according to gender or language.

Evidence on the reliability of the DASS-21 has been summarized based on studies that tested its reliability using Pearson’s correlation coefficients, because only one study utilized the ICC when analyzing its reliability. Studies utilizing Pearson correlation analysis have produced inconsistent results, even those involving subgroups separated by a time interval of around 2 weeks. According to the COSMIN manual, the measurement quality of the reliability should be rated as doubtful when evaluated by the correlation between two measurements without evidence that no systematic change has occurred or with evidence that a systematic change has occurred. The DASS-21 measures states fluctuating over time or situations rather than traits, and so reliability might not be an important property. The authors decided not to downgrade the measurement quality of each DASS-21 study in relation to the evidence regarding systematic changes between measurements. Downgrading the methodological quality depends on the context of the measurements, and exceptional occasions need to be considered because emotion is a relatively versatile context that can result in systematic changes even in the absence of an apparent cause.

Criterion validity has been defined as “the degree to which the scores of a patient-reported outcome measure are an adequate reflection of a gold standard” [58]. Even though the original version of a shortened instrument is considered as gold standard for a self-reported instrument [59], others have insisted that an expert clinical opinion can be used as a gold standard [60]. A psychiatrist-administered SCID for depression and anxiety was considered as the gold standard in the present study.

Quantitative pooling was conducted for evaluating hypothesis testing (convergent validity). High heterogeneity existed even with a random-effects model. Because correlation coefficients > 0.50 are considered to indicate moderate correlations, wide ranges of the coefficient values might have contributed to the high heterogeneity.

Two studies that evaluated the responsiveness of the DASS-21 used paired t-tests as the statistical analysis technique. According to de Vet et al. [59], the paired t-test is related to the statistical significance of changes in scores rather than their validity. The paired t-test is not recommended as a responsiveness parameter. The context of the response also needs to be considered in a qualitative summary of results. For example, two studies included in the current review measured the DASS-21 scores of patients at admission and discharge; that is, after treatment relative to at admission to the hospital. At discharge, patients with psychiatric disorders exhibited improvements in negative emotional status, whereas patients with brain injuries faced new challenges of returning home with some disability. Researchers need to be careful about the direction of changes in order to avoid results categorized as “inconsistent.”

Psychometrically, the DASS-21 exhibited sufficient high-quality evidence for bifactorial structural validity, internal consistency under the bifactor structure, criterion validity (especially for the depression subscale), and hypothesis testing for construct validity. The synthesized evidence of psychometric properties of the DASS-21 is comparable to that of well-known measures of emotional symptoms such as the CES-D, CESD-R, HADS, and PHQ-9 (which demonstrated strong positive evidence in the set of psychometric properties) when also evaluated with the original COSMIN methodology [61, 62]. Because the current review was based on updated COSMIN methodology, sufficient high-quality evidence (the highest rating) was compared to strong positive evidence (the highest rating) in the previous COSMIN methodology. The CES-D demonstrated strong positive evidence for structural, internal, and construct validity when applied to patients with diabetes [61]. The HADS demonstrated strong positive evidence for structural and internal validity, and moderate positive evidence for construct validity among patients with diabetes. There was conflicting evidence for the structural validity of the PHQ-9, which affects the results regarding internal consistency among patients with diabetes. The CESD-R demonstrated strong positive evidence for structural and internal validity and moderate positive evidence for construct validity among the general public [62].

The wide applicability of the DASS-21 is one of its strengths. The DASS-21 has been validated in healthy general populations as well as patient populations (both psychiatric disease and chronic disease patients). The DASS-21 has been applied to a wide range of populations in terms of age (for subjects older than 14 years). The DASS-21 provides helpful information regarding the negative emotional status of subjects. Unlike the HADS that has established cut-offs for suggesting the presence of clinically meaningful anxiety and/or depression, cut-offs have not yet been established for the DASS-21. Further studies of DASS-21 cut-offs would therefore strengthen the usability of this instrument as a screening tool. One limitation would be using the DASS-21 as an outcome measure because further validation studies regarding its responsiveness are required. Applying the DASS-21 to people younger than 14 years also requires further validation studies.

This study applied the recently updated COSMIN methodology to perform a systematic review of the DASS-21. Having structural validity as an anchor for evaluating internal validity and measurement invariance enabled meaningful evaluation of the structure related to psychometric properties. The updated COSMIN methodology requires authors performing reviews to be knowledgeable about the context of PROMs and related valid measurement instruments. The authors are required to set hypotheses to be tested of different types and magnitudes. Discretion is required regarding each measurement property because some studies provide results of psychometric evaluations performed using different properties (e.g., criterion validity rather than hypothesis testing).

Conclusions

The DASS-21 exhibited sufficient high-quality evidence for bifactor structural validity, internal consistency under the bifactor structure, criterion validity, and construct validity. The psychometric quality of the DASS-21 is comparable to that of other well-known related measures evaluated using the original COSMIN methodology. The psychometric robustness and wide applicability of the DASS-21 suggest that this scale can be used to understand negative status emotions including depression, anxiety, and stress in both healthy general populations and patient populations. Establishing cut-off points would improve the practicality of applying the DASS-21. The use of the DASS-21 as an outcome measure requires further validation studies regarding responsiveness. The DASS-21 subscales as well as its total score need to be scored and interpreted as individual emotional symptoms of depression, anxiety, and stress as well as overall negative emotions. Further studies are required into its measurement invariance reflecting a bifactor structure, reliability, measurement error, and responsiveness. The updated COSMIN manual provides detailed guidelines for facilitating systematic reviews of PROMs.”