Introduction

The measurement of depression across cultures has been much discussed in the last 20 years [e.g. 1–6] and issues raised about the validity of simply translating depression tools developed in one culture for use in another, a critical question being: Are standard depression assessment tools measuring the same thing in different cultures? The Edinburgh Postnatal Depression Scale (EPDS) [7] is a self-report measure widely used in research to assess probable maternal depression following childbirth. It has been used in a range of English speaking countries as well as in translation in non-English speaking ones. The EPDS has performed satisfactorily in most validation studies, in English and in translation, the scale has proved easy to administer, is acceptable to women and rates of depression in the range of 10–20% have been consistently found [816].

This paper seeks to extend what is known about the performance of the EPDS, by analysing data from two Australian studies with largely representative samples, comprising women from widely divergent language and cultural backgrounds who completed the EPDS in English or in translation.

Methods

The studies

Two studies provide the data for this paper. The Survey of Recent Mothers (SRM) was a population-based postal survey of all women who gave birth in a 2 week period in 1993 in the state of Victoria, Australia (n = 1366) [17, 18]. The Mothers in a New Country (MINC) study was a companion interview study of immigrant women from Vietnam, Turkey and the Philippines who gave birth in Melbourne, Victoria between 1994 and 1996 (n = 318), with interviews conducted by bicultural interviewers in the language of women’s choice [19, 20]. In both studies, women completed the EPDS – in English or in translation – providing a measure of probable clinical depression. On comparison with routinely collected data on all births in Victoria, both studies had largely representative obstetric samples in terms of parity, method of birth and infant birthweight: in SRM compared with all women giving birth, and in MINC compared with Vietnamese, Turkish and Filipino women giving birth, during the respective study periods [17, 19]. The MINC sample was also representative of all Vietnamese, Turkish and Filipino women giving birth for maternal age and marital status. As might be expected with a postal survey however, women under 25, single women and women born overseas of non-English speaking backgrounds were under-represented in SRM [17].

The samples

All participants with complete responses for the EPDS in both SRM (n = 1310) and MINC (n = 313) were included, forming five samples for comparison of the performance of the EPDS:

  1. (i)

    Women born in Australia or in another English speaking country who completed the EPDS in English as part of SRM: SRM–ESB, n = 1168;

  2. (ii)

    Women born in non-English speaking countries who also completed the EPDS in English in the same survey: SRM–NESB: n = 142.

Women participating in MINC who completed the EPDS mostly in translation:

  1. (iii)

    Women born in Vietnam: Vietnamese: n = 103;

  2. (iv)

    Women born in Turkey: Turkish: n = 104; and

  3. (v)

    Women born in the Philippines: Filipino: n = 106.

For each of the analyses, the MINC sample was also considered as a whole (n = 313) for comparison with the two SRM samples.

The scale

The EPDS is a ten-item self-report measure specifically designed for use in the postpartum period. The use of most standardised depression measures has proved problematic at this time because of the presence of a range of somatic items (e.g. changes in sleep patterns, appetite, fatigue/lethargy) which many mothers will experience after the birth of a baby and which cloud the identification of women who are depressed. A frequent outcome of the use of such measures is an inflated proportion of women assessed as depressed [21]. In developing the EPDS, Cox describes the deliberate exclusion or adaptation of items, which might be inappropriately endorsed by women and a focus instead on the feeling states associated with depression [22]. Thus the item which relates to sleep on the EPDS states specifically ‘I have felt so miserable that I have had trouble sleeping,’ thereby avoiding the difficulty caused by the fact that the usual trouble women have in sleeping at this time is due to the baby waking.

In the MINC study, the EPDS was translated into Vietnamese, Turkish and Tagalog using forward and back translations, a process of committee review, consultation with ethnic psychiatrists and piloting with monolingual and bilingual recent mothers in each community [23]. In response to each item on the EPDS, women score from 0 to 3, with total scores ranging from 0 to 30.

The research questions addressed

The performance of the EPDS in culturally diverse samples and in different languages (English, Vietnamese, Turkish and Tagalog) was examined to determine how similarly women from a range of backgrounds responded to and interpreted items on the scale. Six research questions were addressed in comparing the samples:

  1. 1.

    How similar are the score distributions on the EPDS?

  2. 2.

    Are there significant differences in the proportion of women scoring on each item?

  3. 3.

    How similar are the items on which women score most frequently?

  4. 4.

    How good is the internal reliability of the scale?

  5. 5.

    Is the underlying structure of the EPDS (as revealed by factor analysis) comparable?

  6. 6.

    Is the degree of similarity of factor patterns a function of the method used or of the data themselves?

Analysis

The performance of the EPDS across the samples was compared using a range of strategies. First the distribution of scores in the samples was examined (means, medians and standard deviations). The proportion of women reporting some level of symptomatology (defined as a score >0 on any of the ten items) was calculated and compared for all items of the EPDS in each sample using Chi-square as a test of differences in proportions between the samples.

Second, the internal consistency of scale performance in each sample was assessed, considering both item-total and inter-item Spearman’s correlations. Cronbach’s alpha co-efficients were calculated as summary measures of the internal reliability of the EPDS for all samples.

Third, an unrestricted exploratory factor analysis was conducted, using principal components analysis with varimax rotation, to assess and compare the factor structure of the scale in each sample. Given the potential for correlations between factors (quite likely given the common correlation between depression and anxiety) an oblique rotation was also undertaken and the factor patterns compared. This approach also served to explore research question 6 above: if the factor solutions remain stable with different rotation methods, the factor patterns can more readily be assumed to represent patterns in the data themselves, rather than being artefacts of the particular method adopted [24]. Comparison of the underlying factor structure for each sample was undertaken using Scree plots to examine the variance contributions of the factors derived in each sample. This initial factor analysis was followed by a decision to extract one, two and three factors in turn for each sample, in order to compare the final factor loadings across the samples for each solution derived, and to compare the findings with those of previous studies.

All analyses were undertaken using SPSS 9.0 [25] and Epi Info [26].

Results

Distribution of EPDS scores

The means and standard deviations, medians and the range of scores obtained on the EPDS in all samples are shown in Table 1.

Table 1 EPDS score distribution statistics in all samples

Item analyses

In Table 2 the proportion of women in each sample scoring >0 on each item of the EPDS is shown. Of a total of 60 comparisons made within the MINC sample and between the MINC sample combined and the two SRM samples (Vietnamese:Turkish; Vietnamese:Filipino; Turkish:Filipino; MINC:SRM–NESB; MINC:SRM–ESB; SRM–NESB:SRM–ESB for each item), there were statistically significant differences in proportions on 15 (P < 0.01).

Table 2 Percentage of women reporting some level of symptomatology on each item of the EPDS (i.e. scoring >0) across all samples

Table 3 places the items for each sample in descending order of frequency by which women scored >0 and demonstrates considerable similarity between the samples.

Table 3 Items on the EPDS in descending order by the proportion of women scoring >0 on each item in each sample

All items in all samples were significantly correlated with total scores (Table 4), with one exception, Item 10 in the Vietnamese sample (where all but two women scored zero). Inter-item correlations for each sample were also calculated (data not shown): in the combined MINC and SRM samples, all items were significantly correlated with all other items on the scale, a sign of good construct validity across the samples.

Table 4 EPDS item/total score correlations (Spearman’s rho) across the samples

Finally, the internal reliability of the EPDS demonstrated very good internal scale reliability in all samples, with Cronbach’s alpha co-efficients ≥ 0.80 (range 0.80–0.87).

Exploratory factor analysis

Figure 1 shows the scree plots produced with prinicpal components analysis for all six samples. The similarity in the plots is striking, with almost identical patterns of variance distribution (represented by the eigenvalues) among factors in all samples.

Fig. 1
figure 1

Exploratory factor analysis – scree plots for all samples

The scree plots were also used to determine the number of factors to be extracted, using a combination of two approaches: Catell’s ‘eigenvalues > 1’ rule and examination of the scree plots to determine the point at which the slope approaches zero (the ‘elbow’ in the plot) [27]. In all of the samples, one factor clearly accounted for a large proportion of the variance, followed by eigenvalues for the second and third factors ‘hovering’ around one, (with three just under one) and the elbows in each of the plots also occurring at about these points. Various possibilities could be entertained: three, two or one factor solutions.

Table 5, displays for each sample the eigenvalues and the contribution made to the total variance by the first three factors identified. In each sample, these three factors explain >60% of the variance, exceeding the 50% recommended for a meaningful factor solution [28].

Table 5 Exploratory factor analysis: comparison of eigenvalues and amounts of variance explained in each sample for three factors extracted

A description of how EPDS items loaded on the three factors for each sample is given in Table 6. For clarity, only loadings >0.3 (those generally agreed to be considered meaningful [28]) are shown. The pattern of items loading on the three factors appears least consistent between the three groups in the MINC sample, although in all three samples items 1 and 2 load together, as do items 3, 4 and 5, and items 9 and 10 load together for the Turkish and Filipino samples, but not the Vietnamese. The pattern of loadings for the Turkish and Filipino samples is actually very similar, with only items 6 and 7 ‘switching’ between factor 1 and 2. The Vietnamese sample on the other hand, has a rather different range of items loading on each factor compared with the other two samples. It is possible however, that these differences are simply due to random variation given the smallish sample sizes for each country of birth group.

Table 6 Item loadings on factors 1–3 in each sample

Comparing the total MINC sample with the two SRM samples, there is much greater consistency in the loading patterns for each factor across the samples. All three have items 3, 4 and 5 loading on factor 1, items 1 and 2 load together on either factor 2 or 3, and items 9 and 10 load together, also on factor 2 or 3 in each sample. It needs to be remembered that the order of the factors depends on their eigenvalues and that for factors 2 and 3 these are mostly quite similar, so that ‘switches’ in item loadings are less significant between these two factors in the different samples.

Undertaking an oblique rotation of the factors to allow for potential correlation between factors [29] made only small differences to the pattern of factor loadings in any of the samples, with similar final solutions obtained to those obtained by orthogonal rotation. These data are therefore not presented, but they indicate that the underlying factor patterns are quite stable and likely to be a product of actual patterns in the data, rather than of the method of rotation.

One and two-factor solutions

When the initial factor analysis of the 13-item version of the EPDS was done by Cox, three items were removed to provide what was assumed to be a 10-item uni-dimensional measure of depression [22]. Subsequent factor analytic studies of the EPDS have resulted in a range of one, two and three-factor solutions being extracted (see Table 7).

Table 7 Published factor analyses of the EPDS

To compare the findings in our samples with this previous work, and because of the somewhat subjective nature of decisions about how many factors to extract, the effect of extracting one and then two factors for each sample, was also examined (data not shown).

In our samples, the one-factor solution – representing a uni-dimensional model of depression – showed remarkable consistency in terms of the strength of item loadings (all >0.4) in all the samples, with only two exceptions: item 10 in the Vietnamese sample and item 2 in the Filipino sample which both loaded lower than 0.3. Given that only two Vietnamese women scored on item 10 and that item 2 had proved problematic in the Filipino translation, [23] these discrepancies are perhaps not surprising.

Similarly, a two-factor solution also demonstrated remarkable consistency across the samples, with items 3–9 mostly loading on factor 1 ‘depression’; and items 1, 2 and 10 ‘despair/self-harm’ mostly loading on factor 2.

Discussion

Neither the Mothers in a New Country study nor the Survey of Recent Mothers was designed to contribute to the evidence about the validity of the EPDS in comparison to psychiatric interview in the way of traditional validation studies, but the findings presented in this paper can make a contribution of another kind. The strengths of the two studies for this investigation are their largely representative population samples and the significant involvement of women from a variety of backgrounds. By examining how the EPDS performs in the different populations represented in the two studies, it has been possible to investigate differences and similarities in the responses to the scale for women from very different language and cultural backgrounds, including when the EPDS is completed in English and in translation. This is the first time such an analysis has been reported.

It needs to be said that the strength of the approach taken here is best seen when similarities are found, despite the differences in women’s backgrounds. Given a great deal of consistency, it is reasonable to assume that the EPDS is completed and understood in broadly similar ways by women regardless of their backgrounds. We cannot necessarily assume, however, that what the scale measures, completely captures the construct of depression for each culture. There may be domains of symptom expression missed, even if the ones covered by the scale appear cross-culturally stable and relevant.

Where differences are uncovered between samples, these are less readily explained. Any number of competing possibilities exist, including: a less than adequate scale translation (despite the care taken); cultural differences in symptom expression or preparedness to disclose feelings in a research interview or postal questionnaire, other differences in the nature of participating samples; differences in scale administration (postal in SRM, completed scale collected by the interviewer in MINC), and so on. It seems important to emphasise this point because of the temptation to interpret differences in findings in purely cultural terms, when other explanations might be equally plausible – a criticism that can reasonably be applied to much research into cultural differences [3032].

What is revealed by the analyses of the EPDS from the data presented in this paper is a remarkable degree of similarity and consistency in women’s responses to the scale across very different samples. Similar items were most commonly and least commonly ‘endorsed’ by women, patterns of inter-item and item-total correlations were broadly similar and the scale’s internal reliability is more than adequate in all of the samples. The factor analyses from MINC and SRM reported here identified broadly similar underlying components to the scale across these culturally and linguistically divergent samples, and these patterns remained stable when one, two or three factors were extracted, and when a different rotation method allowing for factors to be correlated, was applied.

There are nine previously published reports of factor analyses undertaken with the EPDS (Table 7). Cox et al. reported a factor analysis of the 13-item predecessor to the EPDS, [22] and the first factor analysis of the 10-item EPDS was reported by Pop et al. in 1992 [34] using a Dutch version of the scale in a large follow up study on the prevalence of depression and thyroid dysfunction after childbirth. Astbury et al. [35] reported the first factor analysis of the English 10-item EPDS in a large Australian population sample (n = 771). A factor analysis was undertaken with a Norwegian version of the EPDS in a study of women attending postnatal visits (n = 411, Berle et al. [36]). Five factor analytic studies of French versions of the EPDS have now also been published. Guedeney and Fermanian conducted a study of 87 women in Paris (half were judged probably depressed by maternal and child health nurses and half were randomly selected) [37]. More recently, factor analyses of the EPDS have been reported from Quebec (Des Rivieres-Pigeon et al.) [38] involving a selected sample of 224 women recruited from a perinatal health promotion program for women of low socio-economic status; there are two studies by Teissedre and Chabrol (n = 772 [39] and n = 299 [40]); and Adouard et al. [41] report a factor analysis from a small study (n = 60) using the EPDS in pregnancy.

Some methodological issues regarding factor analysis require comment. As Brislin points out, the first requirement for any correlational study is an adequate and representative sample in order to avoid problems of measurement error [29]. The Astbury et al. Australian study is the only previous study reporting a factor analysis of the EPDS that has utilised an unselected and largely representative population sample [35]. Second, there is little agreement about how to calculate appropriate sample sizes for factor analytic studies. Brislin’s ‘rough guide’ to sample size is to square the number of variables and add 50, or as a bare minimum to have a sample size of at least 10 times the number of variables to be analysed [29]. For the 10-item EPDS this would mean a sample size of at least 100–150. Small samples may result in incorrect estimation of both the number of factors and the structure of the factors and the lack of theoretical basis for any of the rules of thumb about sample size has been noted [24]. Whatever guide is followed, at least three of the published studies shown in Table 7 [22, 37, 41] are likely to have been too small for reliable assessment of the factor structure of the EPDS.

Despite these methodological issues, and the fact that English and translated versions of the EPDS have been used, that time-points at which the EPDS was administered have been widely divergent and that two studies utilised a different method of factor rotation, the factor patterns found in these previous studies of the 10-item EPDS can be seen as quite similar. In most of the studies the exploratory factor analyses produced one principal component or factor that accounted for around 40% of the variance. After rotation, items commonly loading highly on this factor were: items 7, 8, 9 and 10 labelled ‘depression’ or ‘depressive symptoms/feelings’ and those loading together on either the second or third factors, accounting for less than 5–18% of the variance each, were: items 3, 4 and 5 often called ‘anxiety’. Items 1, 2 (sometimes labelled ‘anhedonia’) and Item 6 loaded somewhat less consistently in the different studies. The differences seen in the number of factors extracted, the exact factor structures and the subsequent naming of factors are just as likely to result from some of the methodological issues described above, and possibly to the differing (and somewhat subjective) decisions by study investigators about the number of factors to extract, as they are to result from real differences in scale performance in the different studies. Our own findings indicated that one, two or three factor solutions all showed considerable consistency across the studied samples.

Conclusion

This analysis of the performance of the EPDS in five samples of women from culturally and linguistically diverse backgrounds in Australia provides little evidence to suggest the existence of major differences in the way Australian-born and immigrant women have responded to the scale. These findings lend further support to the use of appropriately translated and carefully piloted versions of the EPDS in cross-cultural research on depression following childbirth.