Introduction

Depression in children and adolescents is common but often unrecognized [1]. Some studies report an increasing prevalence of depression in children and adolescents, which reaches or can reach 15–20% [2, 3]. The depressive response of children and adolescents, unlike adults, involves low self-esteem, displays strong guilt, adversely affects essential daily activities such as learning, and is closely related to disruptive behavior [4, 5]. Studies on childhood and adolescent depression confirm that it is a chronic and recurrent condition. Most episodes remit within a year, but there is a high risk of recurrence, and 50–70% have the potential to develop additional episodes within five years [6]. Moreover, as they are still immature and have not fully developed their ability to cope, it is challenging to overcome independently [4, 5, 7]. In 2015, the World Health Organization announced that suicide was the second most common cause of death in the 15–29-year-old age group, calling for attention to the severity of depression in children and adolescents [8, 9]. Therefore, depression in children and adolescents requires early treatment based on early assessment.

Researchers’ opinions on screening for depression in children and adolescents are inconsistent [10]. In 2016, the U.S. Preventive Services Task Force (USPSTF) repeated its 2009 recommendations for screening for depression in primary care settings to ensure an accurate diagnosis, effective treatment, and appropriate follow-up [8]. However, some studies’ findings were not supported [11]. The USPSTF did not provide a screening interval but concluded with moderate certainty that the net benefit was for children and adolescents between 12–18 years of age [7]. In addition, many depression-screening tools have been developed and used, including the Patient Health Questionnaire (PHQ) for adolescents and the Beck Depression Inventory (BDI) [5, 8].

The Center for Epidemiologic Studies Depression Scale (CES-D), developed by Radloff in 1977, is one of the most widely used self-rating scales to screen for MDD in psychiatric epidemiology [12]. The CES-D can be applied to the general population because its reliability and validity were tested in those under 25, 25–64, and over 64 years of age during its development. In 1991, Radloff reported that the CES-D was verified for use in children and adolescents [1314]. However, as this is not a diagnostic tool, the results (scores) should be interpreted only as of the level of symptoms accompanying depression, and the appropriate cut-off scores have not been verified.

The predictive validity of a screening tool is essential for accurate disease detection. Two systematic reviews (SRs) on the accuracy of depression-screening tools in children and adolescents were published in 2015 and 2016 [14, 15]. In both SRs, the CES-D was commonly reviewed in addition to the PHQ-9 and the BDI for children and adolescents. Nevertheless, there was no quantitative analysis such as a meta-analysis. Furthermore, it is difficult to use different screening tools for each patient's age in a primary care setting. Nonetheless, the CES-D is the most commonly used depression-screening tool for epidemiological investigations of adults. Therefore, to apply this tool to children and adolescents, a study is needed to quantitatively verify the screening accuracy of the CES-D.

This study is an updated SR on the screening accuracy of the CES-D in children and adolescents. This SR aims to (i) analyze the difference in screening accuracy of each CES-D type for MDD and (ii) compare the predictive validity of the CES-D with that of other screening tools.

Methods

The methodology and reporting in this SR are consistent with the guidelines of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy [16] and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Statement [17].

Search Strategy and Data Sources

The electronic database searches were conducted twice, on August 8, 2020, and March 15, 2022, through MEDLINE, Embase, CINAHL, and PsycArticles. The same search strategy was used for both searches. The keywords were depression and CES-D. We performed an expanded search for depression using MeSH, and CES-D was searched for using its full names and abbreviations. Age was not limited for the search process. We also scanned the reference lists and eligible articles that were cited. An example of this search strategy is shown in Supplementary Table 1.

Eligibility Criteria

The inclusion criteria were as follows: (i) Types of studies: Studies (e.g., cohort study and cross-sectional study) that examined the diagnostic accuracy of the CES-D were included, but those providing insufficient information to construct a 2 × 2 contingency table were excluded. (ii) Types of participants: Only studies in which junior and high school students were included as subjects were considered. (iii) Indexed tests: All the CES-D types were included. (iv) Gold standards: The selected studies were limited to those in which participants were diagnosed with MDD through direct interviews with psychologists that followed the Diagnostic and Statistical Manual of Mental Disorders (DSM) and International Classification of Diseases-Tenth Revision (ICD-10) and/or semi-structured interviews, such as the Structured Clinical Interview for DSM Disorders (SCID), the Mini International Neuropsychiatric Interview (MINI), and the Schedule for Affective Disorders and Schizophrenia for School Age Children (K-SADS). (v) Types of outcomes: Selected studies that could derive true positive (TP), false positive (FP), false negative (FN), and true negative values (TN), and thereby, sensitivity, specificity, positive and negative likelihood ratio, diagnosis odds ratio, and summary receiver operating characteristic (sROC) curves of the CES-D were compared.

Study Screening and Data Extraction

In the literature selection process, one reviewer first removed duplicate articles, following which two reviewers independently confirmed the potential suitability through titles and abstracts in all articles. Differences at each stage were resolved by consensus. When multiple eligible articles using the same dataset were found, we selected the most recently published articles.

The following data were extracted from the selected studies: year of publication, authors, location, sample size, sample characteristics, the prevalence of MDD, the gold standard for MDD diagnosis, the CES-D type and cut-off scores, comparators, TP, FP, FN, and TN. All processes were first reviewed by one reviewer and then independently by another reviewer. During data extraction, disagreement between the reviewers was resolved by consensus.

Quality Assessment

Two reviewers independently assessed the quality of the selected studies using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [18]. The QUADAS-2 was determined across four domains: patient selection, index test, reference standard, and flow and timing and was categorized and quantified as either low, unclear, or high to derive a total quality score.

Statistical Analysis

Meta-analysis was performed using the MetaDiSc 1.4 and MetaDTA software programs [19,20,21]. Based on the TP, FP, FN, and TN described in the 2 × 2 contingency table, screening accuracy was evaluated by yielding pooled sensitivity, specificity, positive and negative likelihood ratios, and diagnostic odds ratios with 95% confidence intervals (CIs). The area under the curve (AUC) of the sROC curve and the Q* value were analyzed to determine the test accuracy. We used a bivariate random effects model [22]. Heterogeneity among studies was judged using random effect (RE) correlation. The AUC values were interpreted as follows: AUC of 0.5 = non-informative test; AUC of 0.5–0.7 = low accurate; AUC of 0.7–0.9 = moderate accurate; AUC of 0.9–1 = highly accurate; and AUC of 1 = perfect test [23]. The Q* value represents the point at which sensitivity and specificity are equal in the sROC curve, with a value of 1 indicating accuracy of 100% [24].

Results

Selection Process and Assessing Bias Risk

A total of 1,443 articles were identified through two electronic database searches (the first, 1,239 articles; the second, 203 articles) and other sources (1 article). Based on 1,061 articles—excluding 382 duplicated articles—the eligibility criteria were first applied through titles and abstracts. We then located and read the full texts in cases where it was difficult to obtain accurate information only from the title and abstract. Consequently, we excluded 1,047 articles (98.7%), and 14 studies [25,26,27,28,29,30,31,32,33,34,35,36,37,38] met the inclusion criteria. The study selection process is detailed in the PRISMA 2020 flow diagram (Fig. 1).

Fig. 1
figure 1

PRISMA 2020 flow diagram of article selection

Of the 14 studies selected, 8 (57.1%) [25, 26, 30,31,32,33, 37] were assessed to have a low risk of bias across all domains and items. In terms of the patient domain, all studies that either had consecutive/random samples or included all participants were assessed as having low bias [26, 30], except for three studies [27, 28, 38]. In the index test domain, the risk of bias was low in all the studies. The CES-D is a self-reported measure; therefore, it was assumed that the gold standard for diagnosing depression would not influence its findings. Contrastingly, knowing the CES-D scores in advance may influence the findings of the gold standard. Nine studies [25,26,27,28, 30,31,32,33, 37] were blinded, and one study [38] used a gold standard before the CES-D. Therefore, the risk of bias was low in these studies. However, four studies were assessed as having an uncertain risk of bias [29, 34,35,36]. Finally, regarding flow and timing, all studies were assessed as having a low risk of bias (Fig. 2).

Fig. 2
figure 2

Quality assessment results of the selected studies by QUADAS-2

Characteristics of Selected Studies

A total of 7,843 participants in 14 studies tested the predictive validity of the CES-D for children and adolescents. The ages of the participants ranged from 10 to 19 years, and the mean age was 14–16 years. The selected studies were published in 10 countries. Three studies were conducted in the United States [30, 36, 38], another three in Colombia [32, 33, 37], and two in Germany [26, 28]. Brazil, China, France, the Netherlands, Rwanda, and Switzerland had one study each. Most of these were large-scale studies with more than 100 participants, except for two studies [37, 38] that included more than 1,000 participants. The 20-item version of the CES-D was used in 13 studies, whereas two studies used the 15-item version [26, 28], and the short 3- and 10-item versions of the CES-D were used in one study [33]. Four studies [29, 30, 34, 38] analyzed the diagnostic accuracy of the CES-D (20-item) using other depression-screening tools. The tools used were as follows: two studies used the BDI [29, 38], one used the Caroll Rating Scale (CRS) [29], one used the Edinburgh Postnatal Depression Scale (EPDS) [30], and one used the Major Depression Inventory (MDI) [34]. The gold standards for diagnosing MDD were direct interviews following the DSM and ICD-10 or semi-structured interviews such as K-SADS, the Diagnostisches Interview bei psychischen Störungen im Kindes- und Jugendalter (Kinder-DIPS), the Mini International Neuropsychiatric Interview for Children and Adolescents (MINI-KID), and the Depression Intensity Scale Circles (DISC). The cut-off scores of the CES-D were 14–30 for the long version and varied without a typical pattern (Table 1).

Table 1 Characteristics of selected studies

Predictive Validity of the CES-D in Children and Adolescents

Based on the 14 selected studies, the predictive validity of the CES-D for screening MDD was compared according to the CES-D type and with other depression-screening tools (Table 2).

Table 2 Summary results of meta-analysis

Predictive validity by the CES-D type

The predictive validity of the long version of the CES-D was assessed for 7,038 children and adolescents in 11 studies (Fig. 3). The prevalence was 23.0%. The sensitivity and specificity ranged from 0.73 to 0.91 and 0.51 to 0.86, respectively. In the meta-analysis, the pooled sensitivity was 0.81 (95% CI, 0.77 to 0.85), pooled specificity was 0.72 (95% CI, 0.68 to 0.76), and RE correlation was 1.000. The sROC AUC was 0.83 (SE = 0.03), and the Q* value was 0.76 (SE = 0.03).

Fig. 3
figure 3

Predictive accuracy of the type of the CES-D

The predictive validity of the short version of the CES-D was conducted on 805 children and adolescents in three studies [26, 28, 33]. The prevalence was 16.0%. Sensitivity ranged from 0.76 to 0.86, and specificity ranged from 0.59 to 0.84. In the meta-analysis, the pooled sensitivity was 0.80 (95% CI, 0.73 to 0.86), pooled specificity was 0.74 (95% CI, 0.65 to 0.81), and RE correlation was 1.000. The sROC AUC was 0.86 (SE = 0.04) and the Q* value was 0.79 (SE = 0.04).

Predictive validity of the CES-D compared to other tools.

The predictive validity of other screening tools compared to the CES-D was assessed for 2,448 children and adolescents in four studies [29, 30, 34, 38] (Fig. 4). The prevalence was 23.0%. The sensitivity and specificity ranged from 0.73 to 0.91 and 0.51 to 0.76, respectively. In a meta-analysis, the pooled sensitivity was 0.84 (95% CI, 0.72 to 0.91), pooled specificity was 0.72 (95% CI, 0.63 to 0.79), and RE correlation was 0.031. The sROC AUC was 0.83 (SE = 0.03), and the Q* value was 0.76 (SE = 0.02).

Fig. 4
figure 4

Predictive accuracy of the CES-D versus other tools

For the other tools, the sensitivity and specificity ranged from 0.72 to 0.91 and 0.59 to 0.81, respectively. In the meta-analysis, the pooled sensitivity was 0.83 (95% CI, 0.74 to 0.89), pooled specificity was 0.74 (95% CI, 0.68 to 0.79), and RE correlation was -0.082. The sROC AUC was 0.83 (SE = 0.03), and the Q* value was 0.76 (SE = 0.02).

Discussion

Compared to other age groups, one of the best preventive interventions for children and adolescents with a high risk of experiencing crises, such as suicide because of depression, is to quickly and easily screen for MDD using reliable tools, thus, enabling early treatment [39]. This study systematically synthesized and analyzed 14 studies that reported on the diagnostic accuracy of the CES-D, a tool widely used to screen for depression in children and adolescents. This review indicates that the CES-D is a valuable tool for predicting depression in children and adolescents.

In this study, the literature search was conducted twice. The same search terms were used in both searches. A total of 203 articles was identified in the second search, but no new articles met the eligibility criteria. The selected 14 studies were published from 1991 to 2018, but 9 studies were concentrated between 2008 and 2013. Due to the coronavirus disease (COVID-19) pandemic, children and adolescents worldwide have not been able to go to school for the second consecutive year and are studying online without interacting with their peers. Although depression in children and adolescents has not emerged as a severe social issue, it has increased in recent years [40]. Therefore, the demand for screening tests for depression in children and adolescents is relatively high.

Of the 14 studies selected for this review, 11 presented the CES-D long version, and 3 presented the short version. The sensitivity of the CES-D in individual studies was in the range of 0.73–0.91. Nonetheless, the specificity was 0.51–0.86, with some deviations. The meta-analysis showed that the pooled sensitivity and specificity of the CES-D by type were similar: long version (0.81, 0.72) and short version (0.80, 0.74). The sROC AUC was similar in both the long (0.83) and short versions (0.86). In the short version of the CES-D, two 15-item cases and one case each of 10-item and 3-item were analyzed. The sensitivity of the 15-item scale was in the range 0.84–0.86, but the specificity range was 0.59–0.84, showing a difference. The sensitivity of the 10-item (0.78) and 3-item (0.76) scales was somewhat lower than that of the 15-item CES-D. However, there was insufficient evidence to draw any conclusions, as there were only three studies. In the case of the Geriatric Depression Scale used for older adults, the short version tends to be more actively used because of the tool’s convenience [41]. Notwithstanding this, the advantages of the short version do not seem to be significant because there is no time constraint for children and adolescents to complete the questionnaire. It is important to note that the 15-item CES-D used in the two studies referred to a tool developed by Pietsch [26]. However, even though the number of items is the same in the short version, the items may differ and must therefore be confirmed [42].

In four studies, the CES-D and other depression-screening tools were compared for their predictive validity. Although the number of studies was insufficient, the CES-D scored similarly with other depression-screening tools regarding pooled sensitivity (0.84 vs. 0.83) and pooled specificity (0.72 vs. 0.74). The sROC AUC and Q* values were also the same at 0.83 and 0.76, respectively. As for the other depression tools, the BDI was used in two cases and the rest in only one case, so a comparative advantage could not be established. The sensitivity and specificity of the CES-D were 0.73–0.86 and 0.75–0.76, respectively, similar to those of the relatively widely used BDI at 0.72–0.84 and 0.74–0.81, respectively. This is in line with the results of existing SRs [14, 15]. The meta-analysis results for the CES-D alone were not reported, but the sensitivity of the four tools (BDI, CES-D, etc.) as reported by Stockings et al. was 0.80, and their specificity was 0.78 [14]. Therefore, the CES-D is similar to the BDI proposed by USPSTF regarding screening accuracy and can be considered for use in children and adolescents [8].

This review had some limitations. None of the 14 selected studies showed high risk, but all were observational and not experimental. RE correlations were all positive, except for other scales, and there was heterogeneity among the studies. The specificity of the CES-D was lower than its sensitivity. However, because the CES-D is a screening tool, a higher sensitivity may be more valuable than specificity. In Radloff’s version of the CES-D, the recommended cut-off score for distinguishing the general population from psychiatric patients was 16; however, the sensitivity and specificity were not presented in that study [12]. Among the studies included in this review, there were no similarities found in the cut-off scores, with two studies [25, 27] indicating a cut-off score of 30 or higher, five studies [31, 32, 34, 35, 38] showing a score of 21–24, three studies [30, 36, 37] indicating a score of 16, and one study [29] indicating a score of 14. Therefore, this review did not include a subgroup analysis according to cut-off scores.

Summary

The prevention of depression in children and adolescents during physical and mental development is a significant public health issue. This study aimed to explore the predictive validity of each CES-D type and the screening utility of the CES-D in comparison with other tools. Fourteen studies met the eligibility criteria for inclusion. In 11 studies, the CES-D long version was measured, and in 3 studies, the predictive validity of the CES-D short version was measured. As a result of the meta-analysis, the pooled sensitivity (0.81 and 0.80) and pooled specificity (0.72 and 0.74) of the CES-D by type were similar. The sROC AUC was 0.83 for the long version and 0.86 for the short version. Four studies compared the CES-D to other tools. The pooled sensitivity (0.84 and 0.83) and specificity (0.72 and 0.74), including the sROC AUC, were similar at 0.83. The CES-D has shown consistent moderate predictive validity, with moderate or high accuracy in identifying MDD in children and adolescents. The CES-D can be used, like other depression-screening tools, such as the BDI, to screen for depression in children and adolescents. Depression during childhood and adolescence is strongly associated with recurrent depression in adulthood. These findings may help in the early detection and rapid recognition of depression in children and adolescents and help them maintain a healthy life through appropriate treatment.