Introduction

Caring for a child recently diagnosed with a chronic health condition can be the source of tremendous stress for parents, particularly mothers, who most often assume the primary caregiving role in families (Gillespie and Primavera 2000). As a result, mothers may have an elevated risk for psychological distress, including depression, in response to their child’s diagnosis (Dolgin et al. 2007; Ferro et al. 2011b). Understanding maternal depression has important clinical implications—not only are symptoms of depression unhealthy for mothers, research has provided evidence showing that depression in mothers has a significant negative effect on children’s health (Beardslee et al. 1998; Kim-Cohen et al. 2005; Pilowsky et al. 2006; Goodman et al. 2011). In pediatric epilepsy, maternal depression has been linked to internalizing and externalizing behavior problems in children, particularly symptoms of anxiety and depression, and poor health-related quality of life (Ferro and Speechley 2009).

In epidemiological and clinical research, symptoms of depression are routinely measured using screening instruments that aim to capture various related constructs underlying the multidimensionality of depression. One of the most commonly used screening instruments is the Center for Epidemiological Studies Depression Scale (CES-D; Radloff 1977), which has received over 20,000 citations since its publication. The CES-D is a 20-item self-report instrument that measures depressive symptoms across four domains: depressed affect, positive affect, somatic activity, and interpersonal relations. This four-factor structure of the CES-D has been replicated in several different study populations (Hertzog et al. 1990; Knight et al. 1997; Williams et al. 2007; Edwards et al. 2010; Morin et al. 2011). Furthermore, a recent meta-analysis of studies confirmed the four-factor structure as robust, despite slight variations in the items compromising each factor and magnitude of factor loadings (Shafer 2006). While the factor structure of the CES-D has been well established, concerns regarding the validity and reliability of this instrument in different groups and over time have been expressed. Despite evidence of response heterogeneity within and between populations, the majority of studies that have examined differences in depression scores using the CES-D, either between groups or over time, have taken measurement invariance for granted (Muthén 1989). This is problematic since violation of the assumption of invariance may result in measuring depression using indicators that are not conceptualized similarly by respondents over time. Thus, comparisons over time can be biased.

Measurement invariance concerns the extent to which the psychometric properties of the observed indicators are generalizable across groups or over time (Vandenberg and Lance 2000; Brown 2006); it tells researchers that the same construct is being measured in the same way in different groups (e.g., males and females) or over time (i.e., repeated measures on the same individuals). Demonstrating measurement invariance makes it possible to interpret differences between groups or over time as real and meaningful. The test for these differences is called structural invariance, and it involves comparing model fit before and after constraining the unobserved factor means to be equal. Without the evaluation of invariance over time, researchers cannot be certain whether observed changes over time reflect true change or if it is the result of changes in the interpretation of items or structure of the construct (Brown 2006).

Previous cross-sectional studies examining measurement invariance between subgroups of women have been mixed. Recently, Van Lieshout et al. (2011) reported partial measurement invariance of the CES-D comparing immigrant and non-immigrant women in the post-partum period. In contrast, Williams et al. (2007) confirmed the four-factor structure of the CES-D, but found that the measure was non-invariant among younger and older African American women. Finally, Knight et al. (1997) replicated the original factor structure of the CES-D, but did not assess measurement invariance.

We are aware of only two studies that have examined the longitudinal invariance of the CES-D; one sampled children and the other included mostly adult male cardiac patients (Motl et al. 2005; Contrada et al. 2006). In the study by Motl et al. (2005), support was found for the original four-factor structure of the CES-D. In addition, full longitudinal measurement invariance was observed using 2-year follow-up data of children during their transition between grade seven and eight. In contrast, Contrada et al. (2006) did not replicate the original four-factor structure of the CES-D, but instead demonstrated longitudinal invariance of a three-factor solution with modifications among a predominately male sample of patients undergoing cardiac surgery. However, it is unknown if these findings can be replicated in a sample of adult women. In particular, there are no studies which have examined the factor structure or measurement invariance of depression scales in mothers of children with a chronic illness. Assessment of longitudinal invariance of the CES-D is important to ensure that changes in symptoms of depression are true changes and not simply an artifact of changes in the interpretation of scale items over time. This is particularly salient given that previous evidence has shown that depressive symptoms in mothers have a negative impact on the health-related quality of life of children with epilepsy (Ferro et al. 2011a). Thus, given the paucity of research surrounding longitudinal invariance of the CES-D, the objectives of this research were to (1) confirm the factor structure and (2) assess the longitudinal measurement invariance of the CES-D over a 24-month period in a sample of adult women.

Methods

Sample

Data for this study came from the Health-related Quality of Life of Children with Epilepsy Study (HERQULES), a prospective cohort study designed to examine the determinants of health-related quality of life in children with epilepsy during the first 24 months post-diagnosis. Families were recruited from pediatric neurology practices and consisted of a sample of English-speaking families with a child diagnosed with epilepsy between 4 and 12 years of age. Participants were recruited over a 36-month period by pediatric neurologists (n = 52) and provided written consent prior to being enrolled in the study. Primary caregivers were contacted by telephone to determine participation status and mailed self-administered questionnaires after diagnosis (baseline), and at 6, 12, and 24 months. For this analysis, data at baseline, 12, and 24 months were utilized due to the equal spacing of measurement occasions. Responses from mothers who completed at least one measurement occasion were retained, resulting in a sample size of N = 347 at baseline. A total of N = 258 (76 %) completed all measurements’ occasions. Approval for HERQULES was obtained from all relevant research ethics boards across the country.

Measurement

The CES-D was included in HERQULES. The CES-D is a 20-item questionnaire designed to assess depressive symptoms in the general adult population over the past week (Radloff 1977). The scale includes 20 items that assesses depressed affect (7 items), positive affect (4 items), somatic activity (7 items), and interpersonal relations (2 items). A four-point adjectival response scale (0–3) is used with anchor points for frequency of symptoms experienced ranging from “rarely or none of the time (less than 1 day)” to “most or all of the time (5–7 days).” The total score ranges from 0 to 60, with a higher score indicating greater impairment. Individuals who score <8 are considered to have clinically irrelevant symptoms, whereas individuals in the 8–15 range are defined as having subclinical depressive symptoms, and those scoring ≥16 are typically identified as having clinically relevant levels of depressive symptoms (Radloff 1977; Cohen et al. 2010).

Procedure and statistical analysis

Confirmatory factor analysis was used to examine longitudinal measurement invariance of the CES-D. In this approach, a hierarchical set of procedures specifying increasingly stringent equality constraints was employed to assess longitudinal invariance (Millsap and Yun-Tein 2004). Tests of longitudinal measurement invariance were as follows : (1) configural invariance imposes no equality constraints on parameters and is used as the basis for more complex models to be tested (Byrne 2012); (2) metric invariance examines the extent to which the magnitude of the factor loadings (λ i ) for particular items are the same over time (Vandenberg and Lance 2000; Brown 2006) and is a prerequisite for making valid comparisons (Bollen 1989); (3) scalar invariance tests for evidence that item thresholds (τ i ) are invariant over time (Brown 2006); and (4) strict invariance is performed to determine whether the residuals (θ i ) of the regression equations for each indicator are equivalent over time (where subscript i is the item) (Steinmetz et al. 2009).

Measurement invariance over time was considered to be present when, after imposing a constraint, there was no appreciable worsening of model fit. If this condition was met, invariance testing proceeded to the application of the next equality constraint. If there was significant worsening of fit, modification indices were reviewed and constraints on non-invariance were removed and allowed to vary freely. This respecified model was then tested against the less constrained model to determine if model fit significantly improved. This strategy, known as partial invariance, requires that some, but not all parameters be constrained over time (Byrne et al. 1989).

Determination of model fit was based on three goodness-of-fit indices. These were the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), Weighted Root Mean Square Residual (WRMR), and Root Mean Square Error of Approximation (RMSEA) and 90 % confidence interval (MacCallum et al. 1996; Cheung and Rensvold 2002; Singh 2009). Because the χ 2 test is sensitive to sample size (Singh 2009), it was examined, but not considered in decisions regarding model fit. Adequate model fit was defined using the following cutoffs: CFI > 0.90, TLI > 0.90, WRMR < 0.90, and RMSE < 0.06 (Yu 2002; Singh 2009; Byrne 2012). If at least three of the four fit indices exceeded the specified cutoff, model fit was deemed adequate (Van Lieshout et al. 2011; Ferro and Boyle 2012).

Due to the ordered categorical nature of responses for the CES-D, the confirmatory factor model was estimated with a weighted least squares estimator using a diagonal weight matrix to generate robust parameter estimates (Muthén and Muthén 2010). The χ 2 difference test (using the DIFFTEST option in Mplus), was used to determine the presence of invariance at each level of the analysis (Muthén and Muthén 2010). However, given that the Δχ 2 is influenced by sample size (Chen et al. 2005), the following changes in fit indices were also used to determine invariance at each stage of testing (Chen 2007): ΔCFI ≥ −0.010 and ΔRMSEA ≥  0.015. In the case of mixed results, closeness to threshold scores and consistency across fit statistics were used to determine invariance.

Full information maximum likelihood was employed to include participants with missing data for the dependent variable, depressive symptoms. Complete data were available for n = 219 individuals, and the pattern of missingness was as follows: missing on one variable, n = 59; two variables, n = 75; three variables, n = 1; all variables, n = 21. Thus, the total sample used for invariance testing was N = 326. All statistical analyses associated with the longitudinal invariance testing were performed with Mplus 6 (Muthén & Muthén, USA). Descriptive statistics of the sample were obtained using SAS 9.2 (SAS Inc., USA).

Results

The mean age of women in this study was 37.6 (6.1) years at the time their children were diagnosed with epilepsy. As shown in Table 1, 80 % of mothers were married, 62 % had graduated from college or university, 66 % were employed either full- or part-time, and 37 % had annual household incomes ≥$80,000. The mean CES-D score at baseline, 12, and 24 months was 20.7 (7.4), 19.1 (6.5), and 18.9 (7.2), respectively. At total of 81 women were lost during follow-up. Compared to those who completed all measurement occasions, non-completers were more likely to be younger (p = 0.0002), not married (p = 0.0040), had lower education (p = 0.0122), and had lower household income (p = 0.0191).

Table 1 Sociodemographic characteristics of the sample at baseline

Prior to testing measurement invariance of the original factor structure of the CES-D, alternative models were examined to determine which provided the best fit to the data. The alternate models tested were based on previously reported CES-D models and included (Perreira et al. 2005; Shafer 2006): (1) three-factor model with depressed and positive affect domains combined, (2) three-factor model with depressed affect and somatic activity domains combined, (3) two-factor model with depressed affect and somatic activity and positive affect and interpersonal relations combined, and (4) one-factor model. Results from fitting these alternate models are shown in Table 2 and suggested that the original four-factor structure of CES-D demonstrated the best fit, and thus, tests for longitudinal invariance continued with this model. Parameter estimates for the four-factor model are shown in Fig. 1.

Table 2 Comparison of alternate CES-D factor structures
Fig. 1
figure 1

Standardized parameter estimates for the four-factor CES-D model. All parameter estimates are statistically significant at p < 0.001

The first step in assessing longitudinal measurement invariance was to establish a well-fitting baseline model (configural invariance) whereby no constraints are placed on the model. Invariance was modeled to test whether an equal number of factors and the pattern of loadings in this longitudinal sample represented those observed in the original construction of the scale. This configural model demonstrated adequate fit, χ 2(1,644) = 2,213.74; CFI = 0.965; TLI = 0.963; WRMR = 1.050; RMSEA = 0.029 [0.026, 0.033] and thus was used as the basis for testing more constrained models. A summary of model fit indices associated with the longitudinal invariance testing of measurement and structure of the CES-D is shown in Table 3.

Table 3 Model fit summary of measurement and structural invariance testing

In the first stage of testing, which focuses on longitudinal measurement invariance, equality constraints were set on the factor loadings to test whether the magnitudes of factor loadings were equal over time (metric invariance). In comparison to the configural model, there was no substantial worsening of model fit, ∆CFI = −0.001; ∆RMSEA = 0.001, indicating metric invariance was observed. Next, thresholds were constrained to be equal over time (scalar invariance). Results suggested the model was invariant over time, ∆CFI = −0.003; ∆RMSEA = 0.001. In the final step of measurement invariance testing, residual variances were constrained to be equal over time (strict invariance). Compared to the scalar model, there was no substantial difference in model fit, ∆CFI = −0.004; ∆RMSEA = 0.002. Thus, the CES-D demonstrated full longitudinal measurement invariance in this sample of mothers. Parameter estimates and the covariance matrix of final invariant model are shown in Tables 4 and 5, respectively. Internal consistency estimates (using Cronbach’s α) were good at each measurement occasion, ranging from 0.76 to 0.81.

Table 4 Parameter estimates and standard errors for the final invariant model
Table 5 Covariance matrix (φ) for the final invariant model

Discussion

To our knowledge, this is the first study to examine the factor structure and test measurement invariance of the CES-D using a longitudinal framework in a large sample of women. Results from the current study support the original model proposed by Radloff (1977); the CES-D can be explained by four latent factors: depressed affect, positive affect, somatic activities, and interpersonal difficulties. This finding is congruent with a recent meta-analysis which demonstrated that the four-factor model of the CES-D is the most commonly reported structure of the measure (Shafer 2006). These findings also align with the results from Knight et al. (1997), who reported an adequate fit for the four-factor model in a large sample of women 36–67 years of age.

The extent to which the CES-D exhibited longitudinal invariance over time across three measurement occasions taken at 1-year intervals was also examined in this prospective cohort study of Canadian mothers. Results provided support for full measurement invariance; an important prerequisite for longitudinal comparison of depressive symptoms (Steinmetz et al. 2009). The presence of metric invariance suggested that the 20 observed indicators were perceived similarly over time, and scalar invariance provided evidence that the magnitude of any biases in responses to an item was equal over time. Strict invariance was also demonstrated which suggested that indicators had the same accuracy and were reliable over time.

While full measurement invariance was established in the current study, researchers have argued that full invariance is not required for substantive analyses if at least a subset of parameters are invariant (Byrne et al. 1989). Indeed, previous research has demonstrated that two metric and scalar invariant parameters (i.e., factor loadings and intercepts) are sufficient for meaningful comparisons between groups or over time (Baumgartner and Steenkamp 1998). Consequently, these results provide the evidence to suggest that the latent structure of the CES-D exhibited stationarity and stability (Tisak and Meredith 1990; Pitts et al. 1996). In other words, the same underlying construct is being measured over time (stationarity) and that the rank order of individuals on the construct remains constant over time (stability).

Given that longitudinal invariance was established, results have practical implications for clinicians and researchers. Since the CES-D demonstrated no difference in structure, scaling, or reliability, changes in mean levels of depressive symptoms can be attributable to real differences in depressive symptoms over time, as opposed to differences arising as an artifact associated with changes in the interpretation of items as individuals mature. This finding is particularly relevant for investigators interested in maternal–child health as depression is understood to be the leading cause of disease burden among women in their reproductive years (Lopez et al. 2006). In addition, it is a fundamental construct underlying the mental health of mothers and how they adapt and cope with their children’s illness (Ferro et al. 2011b; Ferro and Speechley 2012) and is a salient risk factor predicting health outcomes in children (Ferro and Speechley 2009; Ferro et al. 2011a). For example, we have previously shown that mothers of children with epilepsy are not homogeneous, but instead consist of distinct heterogeneous subgroups with unique trajectories of depressive symptoms (Ferro et al. 2011b). Furthermore, we have documented that maternal depression has a negative impact on the health-related quality of life of children with epilepsy, such that children of mothers with depression have lower health-related quality of life scores and less favorable trajectories during the first 24 months after their diagnosis (Ferro et al. 2011a). Importantly, since the sociodemographic characteristics of the study sample were similar to those of Canadian women in the general population (Canada 2006), results are likely externally valid. Thus, when using the CES-D in other adult woman populations, researchers should be confident that changes in CES-D scores over time are reflective of true changes in depressive symptomatology and not an artifact of changes in the interpretation of items in the scale.

Conclusions

The present study provided evidence to support the use of the original, four-factor model of the CES-D in a large sample of adult women. Additionally, results from the measurement invariance testing allow for meaningful comparisons of mean CES-D factor scores over time. Research should continue examining longitudinal invariance of the CES-D in other mother–child populations so that if invariance is established, clinicians and researchers can be confident in using this measure to document depressive symptoms. Future work building upon the results of this study and incorporating tests of invariance between subgroups (e.g., women with vs. without children) over time would make an important contribution to the understanding of depression among mothers.