Introduction

The Centre for Epidemiological Studies Depression Scale (CES-D) is one of the most commonly employed scales used to identify depression in large population studies involving older people [1]. It is free to use and has been validated in an older cohort against structured diagnostic interviews [2, 3].

However, concerns that the 20-item Centre for Epidemiological Studies Depression Scale (CES-D-20) is too time consuming and onerous, especially when embedded within large surveys involving older people, have led to the development of short forms of the scale. While the 10-item Centre for Epidemiological Studies Depression Scale (CES-D-10) has been studied widely [4,5,6], the 8-item Centre for Epidemiological Studies Depression Scale (CES-D-8) is now increasingly employed in large international studies, including the European Social Survey and the Health and Retirement Study [7, 8]. Previous work has confirmed the reliability of the CES-D-8 in comparison to the CES-D-20 in terms of internal consistency and factor structure, as well as correlations with disability and perceived stress, but as yet, no appropriate cut-off value to define depression cases via the CES-D-8 has been identified [9].

The aim of this study, therefore, is to validate the shortened CES-D-8 scale against the original CES-D-20 in a large sample of community-dwelling older people, to identify an appropriate cut-off value to define cases with clinically significant depressive symptoms.

Methods

Study design

We analysed data from Waves 1 and 3 of The Irish Longitudinal Study on Ageing (TILDA), a study of a nationally representative sample of community dwelling adults aged 50 years and over [10]. Participants were included if they underwent a complete assessment at Wave 1, including a CES-D, as well as collection of other biological and social data.

CES-D

The CES-D-20 consists of 20 items on a Likert scale, yielding a total score of from 0 to 3 for each item [11]. A score of 16 or more on the CES-D-20 is used to define clinically significant depressive symptoms [1].

The CES-D-8 consists of 8 items taken from this original CES-D-20. These items are shown in “Appendix A”. The range of the CES-D-8 is, therefore, 0–24. The CES-D-8 was repeated at Wave 3 (4-year follow-up) to assess retest reliability.

Other measures

Relevant biological and sociological data were also collected by self-report at Wave 1, including marital status, third level educational attainment, cardiovascular disease (myocardial infarction, angina, cardiac arrhythmia, or cardiac failure), chronic pain, functional impairment, diabetes and prior stroke.

Statistical analysis

Correlation between the two scales was assessed using Spearman’s rank correlation coefficient. This was used in preference to Pearson’s correlation as the CES-D scales are ordinal and results are not normally distributed. Spearman’s correlation was also used to assess retest reliability at 4-year follow-up.

Cronbach’s α was used to measure internal consistency of the scale. Factor analysis with varimax rotation was used to demonstrate the latent structure of the scale.

Due to the clustered nature of household sampling in TILDA, a cross-validation approach was used whereby analyses were repeated on smaller samples containing one participant from each cluster, and results were compared. For example, when analysing all clusters with a minimum of five households, we initially selected clusters that comprised at a minimum six unique respondent households. Next, we drew a random sample of five participants from that cluster—since each sampled participant was an inhabitant from a unique household, the sample did not include any cohabiting respondents (in order to reduce bias from within-household correlation of data). We then took the first sampled individual from each cluster (cluster N = here) as an observation, and re-ran analysis across the subset of n = (here) individuals. We then repeated this process four further times, allowing each remaining sampled individual to serve as a data point for the cluster they were sampled from. In this way, the analysis provided multiple estimates of Cronbach’s α, Spearman’s ρ and the factor structure that allowed the reliability of data to be verified across multiple samples from separate clusters.

For the full sample analysis, the appropriate cut-off score for the CES-D-8 was determined by comparing scores ranging from 7 to 12 against the CES-D-20 cut-off score of 16. Sensitivity, specificity, agreement and receiver operator characteristics were estimated for each of these potential cut-off values. Agreement was measured by percent agreement calculation, as well as by Cohen’s κ statistic.

To confirm that depression defined by the CES-D-8 had similar associations with depression defined by the CES-D-20 with respect to variables of interest, binary logistic regression models, reporting odds ratios, were used and compared. This analysis was completed for the full sample analysed.

Results

8033 participants were included in the study. 10% (774/8033) scored ≥ 16 on the CES-D-20 and were therefore defined as having clinically significant depressive symptoms.

The baseline characteristics of the study sample are shown in Table 1.

Table 1 Baseline characteristics of study population

Agreement between scales

The Spearman correlation co-efficient between the CES-D-20 and the CES-D-8 was 0.8980 (p value < 0.001), indicating a high degree of correlation between the scales. The CES-D-20 and CES-D-8 both also showed excellent internal consistency with a Cronbach α of 0.8757 and 0.8127, respectively.

The random samples in our cross-validation analysis showed a range of values for Cronbach’s α all of which were close to or above 0.8, indicating high consistency. Similarly the range of values obtained for Spearman’s ρ were all close to 0.9. See “Appendix B”.

Factor analysis

Factor analysis revealed the internal structure of the CES-D-8. Two factors with eigenvalues ≥ 1 were identified, and they accounted for 58% of the total variance of the data. The six ‘negative affect’ items of the CES-D-8 loaded on to Factor 1, while the remaining two ‘positive affect’ items loaded on to Factor 2. See “Appendix C”. Factor analysis remained consistent when analysed iteratively using participants selected as a random sub-sample from each cluster (see “Methods”).

Cut-off scores

Table 2 shows the performance of the CES-D-8 using different cut-off scores compared to the CES-D 20 (with a cut-off score for 16) for diagnosis of clinically relevant depressive symptoms. At a cut-off score of 9/24, the sensitivity and specificity of the 8-item CES-D were 98 and 83%, respectively. The Cohen’s κ for a cut-off score of 9 was 0.7855, suggestive of strong agreement and the ROC area was adequate at 0.88.

Table 2 Performance of different cut-off scores of CES-D-8 against CES-D-20 with cut-off score 16

Table 3 shows binary logistic regression models, comparing the relationship of some variables of interest with depression diagnosis by CES-D-20 (at a cut off score of 16) and CES-D-8 (at a cut off score of 9). Results are broadly similar across scales, with the CES-D-8 demonstrating similar relationships with social and biological factors such as sex, educational attainment and chronic medical conditions.

Table 3 Linear regression, comparing odds ratios for predictor variables for depression by CES-D-20 and CES-D-8

Test retest reliability

Three quarters (6013/8033) of the study sample had the CES-D-8 repeated at 4-year follow-up. The Spearman’s correlation coefficient for the CES-D-8 performed at baseline and at 4 years was 0.4239 (p value < 0.001), suggesting moderate correlation.

Discussion

This study demonstrates that the CES-D-8 correlates well with the CES-D-20 when administered to a cohort of community-dwelling older adults.

At a cut-off score of 9 the CES-D-8 accurately identifies clinically significant depressive symptoms in this cohort when validated against the CES-D-20 with a sensitivity and specificity of 98 and 83%, respectively. Logistic regression models offer further validation, with similar odds ratios demonstrated for variables of interest, such as older age or chronic medical conditions, when comparing the CES-D-8 and CES-D-20 as dependent variables.

The CES-D-8 has previously been shown to have measurement equivalence across different ages, gender and countries [8, 12] and specifically in older people [13]. However, while cut-off scores of 16 and 10 have been established for the CES-D-20 and CES-D-10 [4], respectively, indicating a threshold for clinically significant depressive symptoms, this is the first study to identify an appropriate cut-off score to define cases of clinically significant depressive symptoms using the CES-D-8.

Validated shortened scales allow more rapid assessment and reduce respondent burden in large surveys. While it is often preferable to analyse continuous data from scales such as these, using a defined cut-off score to dichotomize cases into either a depressed or non-depressed group can also be useful as it mirrors decision-making in clinical practice, and allows us to estimate incidence and prevalence figures.

We also show that the internal structure of the CES-D-8 is similar to that of the CES-D-10, with two factors explaining almost 60% of the variance in the data [4]. This is consistent with previous comparisons of factor structure between the CES-D-8 and CES-D-20 [9].

There are some limitations to this study which must be noted. While the CES-D-20 has previously been validated against a structured psychiatric interview, we did not compare the CES-D-8 to this gold standard and a further study validating the CES-D-8 against clinically diagnosed depression would be welcome.

In conclusion, this study demonstrates that when compared to the 20-item CES-D, the 8-item CES-D is a valid and reliable measure of depression symptoms in community-dwelling older adults, and that a cut-off score of 9 can be used to identify those with clinically significant symptoms.