Significance

Assessing perinatal depression in women living with HIV (WLHIV) is important due to the risk for adverse maternal and neonatal outcomes associated with depression. In this study, we present a psychometric evaluation of the Zulu and English versions of the Edinburgh Postnatal Depression Scale (EPDS). We show that all items, except for item 4, performed well psychometrically in measuring ante- and post-natal depression in WLHIV. Both versions showed comparable psychometric performance ante- and post-natally. This study contributes to improved measurement of depressive symptoms among vulnerable women in a resource constrained setting. The early and accurate detection of depressive symptoms ante- and postnatally among perinatal WLHIV can facilitate increased treatment which may in turn help prevent the negative maternal and neonatal outcomes associated with depression.

Introduction

The lifetime prevalence of major depressive disorder (MDD), according to population-based surveys, ranges from 1.0% (Czech Republic) to 16.9% (United States) (Kessler & Bromet, 2013); women’s risk of major depression is about twice that of men. Perinatal depression (depression during the ante- or postnatal period) has a similarly high prevalence. As with MDD, the prevalence of ante- and postnatal depression varies by country. Estimates of antenatal depression range from 7 to 15% in high-income countries and from 19 to 25% in low- and middle-income countries (Gelaye et al., 2016). The prevalence of postnatal depression is estimated to be about 10% in high-income countries and about 20% in low- and middle-income countries (Gelaye et al., 2016). Postnatal depression may last for months or years after delivery in some women (Goodman, 2004), and depression is more likely to recur among women following symptoms consistent with postnatal depression (Meltzer-Brody & Stuebe, 2014). Perinatal depression has been associated with impaired infant physical, cognitive, and emotional development (Gelaye et al., 2016) and with an elevated risk for suicidality in the mother (Orsolini et al., 2016). However, a high number of cases of perinatal depression may go undiagnosed and untreated (Biaggi et al., 2016; Evagorou et al., 2016).

Given the prevalence of perinatal depression, its associated risks and its potential impact on infants, the measurement and accurate detection of perinatal depression is important. Several measures of depression been used with varying degrees of psychometric soundness when examining perinatal women. Among the limited research on this topic in Africa (Tsai et al., 2013), the most commonly used and most psychometrically sound instrument is the Edinburgh Postnatal Depression Scale (EPDS) (Cox et al., 1987), which was originally developed to assess postnatal depression but has also been used to assess antenatal depression (Gibson et al., 2009).

South Africa has particularly high rates of perinatal depression, with antenatal depression rates ranging from 21 to 47% and postnatal depression rates ranging from 17 to 50% (Pellowski et al., 2019). South Africa also has high rates of HIV (Avert, 2020). The measurement of perinatal depression in women living with HIV (WLHIV) poses a particular challenge to accurate assessment. Individuals living with HIV already report high rates of depression (Bernard et al., 2017), and even in the absence of a depressive disorder, people living with HIV also often experience somatic symptoms that are similar to depressive symptoms (Kalichman et al., 2000). Similarly, some symptoms of pregnancy may also overlap with somatic depressive symptoms (Yonkers et al., 2009). Thus, it especially important to use depression screening tools such as the EPDS that minimize assessment of somatic symptoms. Moreover, as perinatal depression measurement tools used in South Africa often require translation to other languages (e.g., Zulu, Sotho, Xhosa), faulty translations may add an extra barrier to accuracy.

Items in translated measures must convey the same meaning as in the original language (Fuggle et al., 2002), and sociocultural differences and differing education levels may present difficulties in measuring constructs cross-culturally (Aydin et al., 2004). Overall, previous research suggests that isiXhosa, English, and Afrikaans translations of EPDS have been successfully validated and psychometrically sound (de Bruin et al., 2004; van der Westhuizen et al., 2018). However, as Kubota and colleagues (2018) explain, the factor structure of the EPDS is not clear and seems to vary by study and sometimes by whether women were assessed antenatally or postnatally. One-factor, two-factor, and three-factor (depression, anxiety, and anhedonia) structures have been described. Cultural differences in symptom profiles could also potentially influence this finding, as individuals in different cultures often experience depression, including postnatal depression (Evagorou et al., 2016), in different ways (Juhasz et al., 2012).

With the goal of providing a robust examination of the psychometric properties of the EPDS in a population at greater risk for depression, the current manuscript examined the factor structures and reliability of the English and Zulu versions of the EPDS among pre- and postnatal WLHIV in South Africa. In addition, this study compared how the functioning of these items between the English and Zulu versions of the EPDS using factor analytic and item response theory (IRT) approaches. Based on previous research, it was hypothesized that the EPDS in both the English and Zulu versions would be shown to have three factors representing depression, anxiety, and anhedonia, providing evidence of construct validity in both languages. With regard to comparative analyses, based on past research, it was hypothesized that items 6 and 10 would function differently between languages.

Methods

Ethical Approval

Before beginning this study, the researchers obtained ethical approval from the University of Miami School of Medicine, the Human Sciences Research Council Institutional Review Board, and the Mpumalanga Provincial Government. All study procedures have therefore been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. Participants provided written informed consent prior to study procedures.

Participants and Procedures

Participants were N = 1399 pregnant women in South Africa participating in a randomized controlled trial (RCT) of an intervention designed to prevent mother-to-child transmission (PMTCT) of HIV. The complete protocol for the RCT has been previously published (Jones et al., 2014). After providing informed consent, all of the women completed the EPDS (Cox et al., 1987) during antenatal assessments in their preferred language; women had the option to choose English, Zulu, or Sotho, all official languages of South Africa. The women completed the EPDS and other study instruments using Questionnaire Development System’s (QDS) Audio Computer-Assisted Self-Interview (ACASI) software. ACASI allows participants to read the questions presented while also hearing the questions or statements read aloud over headphones to compensate for varying degrees of literacy. Antenatally, a total of N = 1399 pregnant between 6 and 30 weeks of pregnancy completed the EPDS. Given the low of number of women completing the EPDS in Sotho (n = 220), the final sample size in this study included n = 1179 women who completed the EPDS in Zulu (n = 709) and English (n = 470). Women completed additional assessments 12 months after birth; at 12-month follow-up, there were n = 957 women who completed the EPDS. However, a smaller proportion of women completed the EPDS in Sotho, who were not included due to having an insufficient number of people (n = 91). A total of n = 866 women were analyzed at 12-month follow-up; these included n = 494 who completed the EPDS in Zulu and n = 372 who completed the EPDS in English.

Measure

The EPDS (Cox et al., 1987) was used to assess antenatal and postnatal depression for this study. The EPDS is a 10-item instrument that asks participants to rate how often they have experienced different symptoms associated with postnatal depression in the past 7 days. All of the items are scored on a scale of 0 through 3, although each item uses different labels for items choices. Scores on this measure can range from 0 through 30, in which scores of 10 can indicate possible depression in the United States; however, the cutoff was raised to 12 in South Africa (Rochat et al., 2013). In addition, one item (item 10) assesses suicidal ideation. Supplemental Table 1 lists all scale items as well as their respective item choices in Zulu and English.

Table 1 Two-factor models of Zulu and English versions of the EPDS antenatally

Analytic Plan

Frequencies, means and standard deviations were first used to response frequencies. Item discrimination indices were also calculated and represent the correlation between the item and the rest of the items on the scale, the corrected item-total correlation.

The factor structure of the EPDS at antenatally (6 to 30 weeks of pregnancy) and 12 months postnatal was assessed in Zulu and English using a series of exploratory factor analyses (EFAs) and a geomin rotation. One- to three-factor models were estimated using diagonally weighted least squares (WLSMV), which has no distributional assumptions. Model fit was assessed using Root Mean Square Error of Approximation (RMSEA) <.05, Comparative Fit Index (CFI) greater than .95 (Hu & Bentler, 1999), Tucker Lewis Index (TLI) greater than .90 (Bentler & Bonett, 1980), and <.08 for standardized root mean squared residual (SRMR; Hooper et al., 2008). Lower RMSEA and SRMR values, and higher TLI and CFI values indicate improved model fit, suggesting a better model. Exploratory factor analyses were conducted on Mplus v8.4.

The equivalence of EPDS items by language was tested using the lordif software package in R (Choi et al., 2011). Lordif utilizes ordinal logistic regression to evaluate differential item functioning (DIF). Specifically, DIF refer to whether women completing the English version of the EPDS endorse item response options at the same rate as women completing the Zulu version of the EPDS at different score levels. When there is a differential probability of responding depending on the language despite having different total scores, two different type of DIF can be detected. Uniform DIF suggests that there is a consistently different performance between groups across all score groups, whereas non-uniform DIF refers to an inconsistent probability of responding between groups. These models are tested at α = .01 level. Item characteristic curves are then plotted for all items exhibiting DIF. Item characteristic curves show the probability of endorsing an item response option (0 to 3) at standard deviations of the underlying latent variable, depressive symptoms. In large sample sizes, − 2 likelihood ratio tests may overestimate DIF. As such, empirical thresholds to detect DIF were computed using Monte Carlo simulations in DIF-free samples (α = .01; 1000 replications). Then, the highest calculated empirical threshold from the Monte Carlo simulations was used to re-examine uniform DIF and non-uniform DIF (Choi et al., 2011).

Finally, the reliability of the EPDS was assessed antenatally and at 12 months postnatal. Cronbach’s α was used to assess internal consistency. Conventions suggest that α values greater than .70 indicate adequate reliability.

Results

Sociodemographic Characteristics of Women and Prevalence of Ante- and Postnatal Depression

The mean age of the women (n = 1179) was 28.44 (SD = 5.85), and the majority reported their race to be Black African (97%); the remainder (3%) identified as “other” or “colored”. All women had partners as part of the original study design; 79% were not married, and 21% were married. Slightly more than half of women (58%) had completed at least 10 years of school and had an average monthly income of 1049.72 South African Rand (USD$57.30). A cutoff of ≥ 13 has been used to define depression in previous research among South Africans (Rochat et al., 2013). Antenatally, 42% of women were above this cutoff according to the English version; 47% were above this cutoff using the Zulu version. Postnatally, 33% were classified as having probable depression using the English version, compared with 40% in the Zulu version.

Item Discrimination of EPDS Items Ante- and Postnatally

Item discrimination indices are presented in Supplemental Table 2. Antenatally, items 1, 2, and 4 in the English version showed low discrimination values; in the Zulu version, only item 1 had a low discrimination value. Postnatally, items, 2, and 4 in the English version showed low discrimination values; in the Zulu version, only item 1 showed a low discrimination index.

Exploratory Factor Analyses for Antenatal EPDS

A series of factor analyses were performed for Zulu and English versions. Model fit indices for one- to three-factor models are shown in Supplemental Table 3. A three-factor model for the English version of the EPDS did not converge. In addition, although the three-factor model showed the best fit for the Zulu version, the third factor only included one item (Item 4), which also loaded significantly on another factor indistinguishably. Item 4 in two-factor models showed low correlations in both Zulu (.205 and .165 on factors 1 and 2) and English (.324 and .226) versions. As such, the two-factor models (shown in Table 1) were retained in future iterations. The first factor was identified as anhedonia and the second factor as depression; in previous research, the factors have been identified as anhedonia, anxiety, and depression (Kubota et al., 2018). The second set of EFAs showed that the two-factor model without item 4 had a similarly adequate fit for both languages (Supplemental Table 4).

Differential Item Functioning of the EPDS by Language Antenatally

Ordinal logistic regression models tested whether participants completing the EPDS in different languages endorsed items at a different rate. Six items (1, 2, 4, 5, 6, and 9) were flagged for DIF at α = .01 (Table 2). Five of these items (1, 2, 4, 6, and 9) showed uniform DIF, whereas item 5 showed non-uniform DIF. Item true score functions were plotted for items flagged for DIF and are shown on Supplemental Fig. 1. Items 1, 2, and 9 of the Zulu version appeared to overestimate EPDS scores at low and high levels of depressive symptoms. Items 4 and 6, conversely, underestimated depressive symptoms at multiple levels of scores in the Zulu version. Item 5 of the Zulu version underestimated scores at low level of depressive symptoms, but overestimated scores at high levels of depressive symptoms. Monte Carlo simulations were then performed to determine the highest cutoff to detect DIF in R2 values. The highest Monte Carlo threshold was Nagelkerke ΔR212 = .0067 for uniform DIF; only Item 2 was above this cutoff. For non-uniform DIF, the empirical threshold was Nagelkerke ΔR2 = .0058; no items were above this cutoff.

Table 2 Differential item functioning of the EPDS by language antenatally

Exploratory Factor Analyses Postnatally

A series of factor analyses were conducted for Zulu and English EPDS versions that were administered postnatally. Model fit indices were one-factor to three-factor models of the EPDS are presented in Supplemental Table 5; the three-factor solutions in both languages showed the best fit. However, factors were composed of different items in both of the languages (Table 3). In the Zulu version, items in Factor 2 did not have significant factor loadings. In the English version, although item 4 did not load onto a separate factor, this item showed low, non-significant factor loadings on all factors. As such, item 4 was removed from subsequent analyses, and a second set of EFAs were conducted. The results of the second set of EFAs are presented in Table 4. EPDS items completed postnatally in Zulu loaded onto two factors, which were interpreted as anhedonia and depressive symptoms. Based on past research, the three factors of the English version were interpreted as anhedonia cognitive depressive symptoms, and behavioral depressive symptoms.

Table 3 Exploratory factor analyses of Zulu and English versions of the EPDS postnatally
Table 4 Exploratory factor analyses of Zulu and English versions of the EPDS postnatally without item 4

Differential Item Functioning of the EPDS by Language Postnatally

Tests of DIF were performed using ordinal logistic regression models to determine differences in the probability of endorsement by language. Four items (3, 4, 5, and 7) were flagged for DIF at α = .01, as shown on Table 5. Item 3 exhibited only non-uniform DIF. Three items (4, 5, and 7) exhibited both uniform and non-uniform DIF. Item true score functions were plotted for DIF items and are shown on Supplemental Fig. 2. Items 3 and 7 of the Zulu version overestimated scores at low levels of depressive symptoms, but underestimated scores at high levels of depressive symptoms. Items 4 and 5, on the contrary, underestimated depressive symptoms at low levels of EPDS scores, but overestimated scores at high levels. The highest Monte Carlo threshold for postnatal EPDS scores was Nagelkerke ΔR212 = .0083 for uniform DIF; no items were above this cutoff. For non-uniform DIF, the empirical threshold was Nagelkerke ΔR2 = .0091; item 4 was above this cutoff.

Table 5 Differential item functioning of the EPDS by language postnatally

Reliability

Finally, reliability estimates were calculated for Zulu and English versions of the EPDS ante- and postnatally. Antenatally, the EPDS had adequate reliability in Zulu (α = .78) and English (α = .78). Postnatally, the EPDS had adequate reliability in Zulu (α = .82) and English (α = .77).

Discussion

This study examined the factor structures of the English and Zulu versions of the EPDS administered pre- and postnatally among WLHIV in South Africa. Using factor analytic and item response theory (IRT) approaches, the English and Zulu versions of the EPDS were compared. Although a few items performed poorly, particularly item 4, the construct validity of the English and Zulu versions of the pre- and postnatally administered EPDS was supported; the reliability of the scale was also supported. Contrary to expectations, items 6 and 10 performed well in both versions. Previous studies have examined the psychometric properties of a Zulu version of the EPDS (Rochat, 2011; Rochat et al., 2013), but this is the first study to examine its factor structure and establish measurement equivalence with an English version.

In the multiple psychometric assessments of the EPDS performed in this study, item 4 was consistently shown to be low performing relative to other items in the scale. Specifically, item 4 had low discriminating power, performing worse in the English version. In the series of factor analyses conducted ante- and postnatally, item 4 had to be excluded from the factor structure of the EPDS due to low factor loadings, which indicates a low amount of variance was explained by the item in the construct of depression. These findings suggest that this item did not contribute to the total score significantly, supporting its removal. Item 4 was also shown to function differently in the English and Zulu versions of the EPDS, underestimating symptoms of depression in Zulu. Given the consistent low performance of item 4 in both languages, the inclusion of this item in EPDS scales among WLHIV in South Africa may provide inaccurate estimates of the overall burden of depression. The low performance of this item may have been related to the phrasing of this item, “I have been anxious or worried for no good reason”. Among WLHIV in a resource constrained setting with very reasonable worries about the transmission of HIV to their infants, the term “for no good reason” may have introduced confusion, thereby making a small contribution to inaccurate measurement of depressive symptoms in this setting. Because item 4 is included in briefer Zulu versions of the EPDS, including the EPDS-3 (Rochat et al., 2013), the poor psychometric performance of this item may be a barrier for women entering treatment for depression in a setting where mental health resources are already scarce (Docrat et al., 2019). The removal of item 4 may be warranted when assessing women in comparable settings. Though these findings strongly support the low performance of Item 4 and its removal, the replication of these findings is warranted in more diverse South African samples.

The validity and reliability of both versions of the EPDS were supported in this study by adequate internal reliability and factor structures that were consistent with previous research (Kubota et al., 2018). Specifically, the factor structure of the Zulu version was consistent across the peripartum period, but this was not the case for the English version. Different factor structures have been known to emerge depending on whether the EPDS is completed ante- or postnatally (Kubota et al., 2018). The differences in factors and items loading onto factors may thus be explained by differences in the perinatal period in which women were assessed. Kubota et al. (2018) assessed women one month postnatally, whereas women in this study were assessed 12 months postnatally; it is possible that the factor structures may have differed because of time since birth. The differences in factor structures between the two versions in the postnatal period indicates a lack of dimensional measurement invariance, suggesting a need to modify these scales to achieve equivalence.

Using EPDS cutoffs developed in South Africa from previous literature (Rochat et al., 2013), the prevalence rates of both ante- and postnatal depression were high, ranging from 33% (postnatal English version) to 47% (antenatal Zulu version). Prevalence estimates for antenatal depression range from 7 to 15% in high-income countries and 19–25% in low-income and middle-income countries (Gelaye et al., 2016). Postnatally, these rates range from 10% in high-income countries and about 20% in low-income and middle-income countries (Gelaye et al., 2016). Particularly higher prevalence rates of perinatal depression have been reported in South Africa, where rates of antenatal depression have ranged from 21 to 47% and postnatal depression rates from 17.1 to 50.3% (Pellowski et al., 2019). However, studies have used different cutoffs (Tsai et al., 2013), which complicates the comparison of these prevalence estimates to previous research. The prevalences reported in this study were in the range of previously reported estimates, which supports the validity of the scale in this context.

Limitations

The psychometric evaluation of the EPDS has limitations. Structured clinical interviews, which are considered the gold standard in the process of validating self-report measures, were not conducted to validate the EPDS. Second, although a number of women completed the EPDS in Sotho ante- and postnatally, the factor structure of the Sotho version could not be analyzed due to a small sample size. In addition, given that all women included in the study were women with HIV, future research should focus on assessing more diverse South African samples, as the psychometric performance of the EPDS may vary between women with and without HIV.

Conclusions

Brief screening measures such as the EPDS can help identify the burden of depression among perinatal WLHIV across a range of settings but strong psychometric properties are needed to ensure that this scale does so accurately, especially when adapted and translated. This study therefore contributes to improved measurement of depressive symptoms among vulnerable women in a resource constrained setting. The early and accurate detection of depressive symptoms ante- and postnatally among perinatal WLHIV can facilitate increased treatment which may in turn help prevent the negative maternal and neonatal outcomes associated with depression.