Introduction

The Edinburgh Postnatal Depression Scale (EPDS; or EDS outside of the postnatal period: Cox et al. 1987; Murray and Cox 1990; Cox et al. 1996) is now probably the most widely used self-report measure to screen or assess for possible depression or depressive symptoms in perinatal populations. The measure was developed in part from the Hospital Anxiety and Depression Scale (Zigmond and Snaith 1983), in response to problems noted by the authors with other depression self-report scales in use at the time (e.g. the State of Anxiety and Depression—Bedford and Foulds 1978; the Beck Depression Inventory—Beck et al. 1961; the General Health Questionnaire—Goldberg 1972). These scales had not been developed specifically for the perinatal period, and included somatic items which could be misleading as indicators of postnatal depression (e.g. sleep difficulties; increased fatigue), and had other items that were probably inappropriate for women during the postnatal period (e.g. enjoyment of books, radio or television). The rate of false positives obtained on these other questionnaires was thus a clear obstacle for the detection of women with postnatal depressive symptoms, and the EPDS was therefore developed and shown to have good psychometric properties (Cox et al. 1987; Cox and Holden 1994, 2003).

Since its publication in 1987, it has gained increasing usage in both English-speaking countries (e.g. UK, New Zealand, Australia, North America) and non-English speaking countries (e.g. France: Guedeney and Fermanian 1998; Italy: Benvenuti et al. 1999; Taiwan: Huang and Mathers 2008). It is used within routine clinical practice in many public health services in Australia to screen for possible depression, both antenatally and postnatally (e.g. Buist et al. 2008; Matthey et al. 2004), and is used to identify women who are likely to be having some form of emotional difficulty. Where a woman scores “high” on this measure, which is often determined by using the validated postnatal cut-off score for English-speaking women for major or minor depression (10 or more), she is then referred for a more in-depth assessment to ascertain the precise nature of her difficulties (beyondblue 2011).

In addition, the EPDS is used within research studies to report on rates of women with high scores, which is often equated to rates of women with probable depression (e.g. Bowen et al. 2012; Evans et al. 2001); to evaluate clinical services (e.g. Harvey and Pun 2007); and to evaluate the impact of prevention or treatment programs (e.g. Howell et al. 2012).

More recently, the scale has also started to be used to screen for possible depression, or anxiety or distress, in men. It has been validated for men in the postnatal period in five studies, in Europe, Australia and Southeast Asia (Edmondson et al. 2010; Lai et al. 2010; Massoudi et al. 2013; Matthey et al. 2001; Tran et al. 2012). In addition, a recent Italian study has provided further data for men, including those outside of the perinatal period (Loscalzo et al. 2015). Many studies are thus using the EPDS to report on emotional disorders antenatally or postnatally in men (e.g. Escriba-Aguir and Artazcoz 2011; Gawlik et al. 2014; Ramchandani et al. 2005).

In addition, the scale has been looked at for its ability to detect women (or men) with high levels of anxiety, and various authors have suggested cut-off scores on the three items (3, 4 and 5) which usually make up the anxiety subscale in English-speaking women (Matthey 2008; Matthey et al. 2013a; Phillips et al. 2009; Swalm et al. 2010).

In all of these uses, it is often stressed that the measure is not a diagnostic instrument, as per the authors’ original instructions, and that the presence or absence of a depressive disorder can only be determined by then administering a diagnostic interview. It has clearly been an extremely valuable tool for both researchers and clinicians, and has aided the cause of highlighting the need to consider women’s (and to a certain extent men’s) emotional health during the perinatal period.

The purpose of this paper is to critically consider some problems and limitations with the scale, some of which are unique to this measure. We feel that this is timely given the increasing emphasis of this scale in routine screening programmes and perinatal guidelines (e.g. beyondblue 2011; Committee on Obstetric Practice 2015; Venkatesh et al. 2016); the reliance on this measure to report on prevalence, risk factors and treatment outcomes; and the recent use of the scale as a “gold standard” against which to judge the utility of other measures (e.g. Darwin et al. 2016). We would like to emphasise, however, that we appreciate the benefits that this scale has given to the clinical, research and client communities over many years and merely wish to consider more carefully some of its possible limitations, some of which were originally acknowledged by the scale’s authors (Cox and Holden 2003). As Alderdice et al. (2013) recently stated, “(regarding measures used in perinatal mental health) it is important that (they) are open to on-going criticism and analysis….” (p. 435).

Possible limitations of the EPDS

Ambiguous questions

It is possibly a truism to say that almost any set of questions will be misinterpreted by some people—and as language changes over time, this becomes more likely. That the EPDS questions are sometimes misinterpreted by women would not therefore be unique to this instrument. Armstrong and Small (2010) reported that 10% of their sample (or possibly 15%, depending upon the interpretation of the figures in their paper) of 147 postpartum women said that they had difficulty in understanding the measure. And this rate is likely to be an underestimate of the real rate, given that misinterpreting a question (e.g. item 10—see below) does not necessarily mean that the woman finds it unclear or confusing—just that she interprets it differently to the clinician’s interpretation who administers the scale.

In English-speaking countries, some investigators have reported confusion with the following questions:

  1. a)

    Item 6: “Things have been getting on top of me”.

Godderis et al. (2009; Canada) reported on a small qualitative study, in which this question was found to be confusing and could even be seen as sexual. They suggested that the word “overwhelmed” might be a less confusing term. Allison et al. (2011) and Mayberry et al. (2007), both from the USA, also reported difficulties with this question. The latter authors changed the wording to “Things have been getting too much for me” in an attempt to reduce the ambiguity of the wording.

  1. b)

    Item 7: “I have been so unhappy that I have had difficulty sleeping”.

This wording was considered too restrictive by Godderis et al. (2009), as the difficulty with sleeping may not be due to unhappiness, but due to worry or anxiety. Thus, for example, worrying excessively about harm befalling her infant—a common thought in the early postnatal period (e.g. Abramowitz et al. 2010)—may not result in a woman endorsing this question unless such a worry has finally affected her level of (un)happiness. This question can also be misinterpreted in a different way. Midwives screening women for psychosocial health during the antenatal period in one of our health services say that often, on exploring a positive response to this question, women say that their difficulty sleeping is not due to their mood (“unhappiness”), but due to physical reasons (i.e. being pregnant, needing to go to the toilet, etc). In addition, midwives have also reported that on occasions women interpret the wording as “Yes, I have been so unhappy because I have been having trouble sleeping” (and this trouble is again due to the normal difficulties associated with being pregnant). Of note is that a similar type of confusion appears to have been experienced in a postpartum study by Lawrie et al. (1998) with English-speaking South Africans, as they added the phrase “not due to the baby” to clarify that the difficulty in sleeping is not due to the baby waking up or being unsettled.

  1. c)

    Item 10: “The thought of harming myself had occurred to me”.

A recent study by Kim et al. (2015) showed that 13% of their sample of 574 women who had endorsed this item either misunderstood it or misinterpreted it. This supports the anecdotal evidence from discussions with midwifery staff in one of our health services, as well as our experience when talking with women in our studies, which shows that this question can be misinterpreted by women. When questioned, pregnant or postpartum women who endorse this question often say that they have had thoughts that they may accidentally harm themselves—thus, they have had thoughts of falling down stairs, of bumping into things or of having an accident when they are with their baby. It is interesting to note that some investigators using the EPDS in other languages have clarified this “self-harm” question to mitigate against such possible misinterpretations (e.g. Fisher et al. 2007, with Vietnamese women use “Have you had thoughts that you do not want to live anymore?”; Montazeri et al. 2007, with Iranian women, use the word “suicide”).

  1. d)

    Lawrie et al. (1998) also reported making slight changes to the wording of some of the questions and their response options when the scale was used with English-speaking South Africans to increase their comprehensibility. Thus, the response option in item 2 of “rather less than” was changed to “a little less”; in item 6, “cope” was changed to “manage”; and the wording of item 4 was changed from “anxious” to “worried”.

While within a clinical setting misinterpretations regarding questions may be picked up on further enquiry, this may not always be the case depending upon how the EPDS is used. And within many research studies, and all online scales, such misinterpretations will almost never be detected, given that these often do not include a discussion with the participant as to their individual responses on this scale.

Exclusion of distress due to question qualifiers

The EPDS specifically excludes the detection of some women—or men—who will have high levels of anxiety or distress. Examining the three items that usually load on the anxiety factor (items 3, 4 and 5) shows:

  1. a)

    It excludes worries that to the participant are reasonable, yet distressing—see items 4 and 5 (“I have been anxious or worried for no good reason”; “I have felt scared or panicky for no very good reason”). Thus, women who have high levels of worries about such things as sudden infant death syndrome, or the possibility of a hereditary family illness, may not endorse these items. Similarly, a man who feels very stressed due to juggling both work and home commitments, and who sees this stress or worry as being reasonable, may not give a positive response to these items. Godderis et al. (2009) reported that some women in their study felt that they always had a good reason for their worries.

  2. b)

    It excludes self-blame that the respondent considers justifiable in item 3 (“I have blamed myself unnecessarily when things went wrong”). This issue was indeed found to be problematic in the study by Godderis et al. (2009), who found that “a number of women (in their study) felt unsure how to judge whether they blamed themselves unnecessarily” (p. 20, original italics), especially if they were first-time mothers and feeling uncertain what they should expect of themselves. Some women also stated that feeling this (self-blame) was not in fact symptomatic of low mood but was just reflective of their personality.

  3. c)

    In addition, it excludes significant levels of worrying that prevents respondents from sleeping. Item 7 only enquires about difficulty sleeping due to unhappiness, not due to worry (“I have been so unhappy that I have had difficulty sleeping”). So a woman—or man—who is worried, for example, about sudden infant death syndrome, and thus cannot sleep, may not endorse this item.

It is difficult to imagine that clinical services—or researchers—would want to exclude detecting high levels of distress simply because the respondent considers the reasons for his/her feelings are “reasonable”, “warranted” or not due to feeling very unhappy. This, however, is what could happen with some clients or study participants completing the EPDS, and unfortunately, unless considerable time is spent probing “negative” responses, these false negatives will always go undetected.

Scoring problems

While the EPDS is claimed to be easy to administer, which presumably includes being easy to score (e.g. Brealey et al. 2010; Cox et al. 1987; Glavin et al. 2010; Leigh and Milgrom 2007), a recent study clearly shows this is not the case, at least for clinical teams (Matthey et al. 2013b). In this study, almost 500 completed EPDS forms across four perinatal clinical services were audited for scoring accuracy. Between 13.4 and 28.9% of these forms were scored incorrectly by the clinicians. Either the total score was added up incorrectly (3.3–9.6%) or at least one question on the EPDS was incorrectly scored (11.3–19.3%). It appears that the somewhat complicated scoring system of the EPDS, whereby three questions are scored in one direction and seven in the other (and these questions are interspersed with each other), may be a contributing factor to some of these scoring errors. Of concern is that one of the clinical services in this study was responsible for training other services within the health network on the administration, scoring and interpretation of the EPDS. The fact that the clinicians in this service also made frequent scoring errors suggests that no amount of training would improve this scoring problem.

A high rate of false positives

A screening test is optimal if it correctly identifies people with a condition (sensitivity) and those without the condition (specificity). In addition, it needs to be reasonably accurate when classifying people as being likely to having the condition (positive predictive value (ppv)) and those not having the condition (negative predictive value (npv)) to ensure that clinical services are not overburdened with unnecessary referrals which will have a detrimental impact on providing services to those in need.

Unfortunately, the EPDS is consistently found to have a rather low ppv—postnatally for women, it is only around 50–60% (Matthey et al. 2006; Milgrom et al. 2011), and antenatally, it is often of the same, or lower, magnitude (Kozinszky and Dudas 2015). For men postnatally it is only around 20–30% (Edmondson et al. 2010; Massoudi et al. 2013; Matthey et al. 2001). These low ppv values mean that services need to be aware that for women, around half (postnatally) or two thirds (antenatally) of those scoring high do not, in fact, have the diagnosed disorder the cut-off score was validated against, and this figure for men is even more, being around 70–80%. As Kozinszky and Dudas (2015) conclude, “the EPDS will yield a substantial proportion of false positives, which is costly to service providers… also (it) will miss a considerable number of cases (similar to the majority of other screening tools)” (p. 102).

Incorrect cut-off scores

Many research or clinical studies have been found to use incorrect EPDS cut-off scores (Matthey et al. 2006). This error unfortunately continues (e.g. Banker and LaCoursiere 2014; Meltzer-Brody et al. 2013; Paul et al. 2013; Perinatology.com 2016). If researchers misread the original validation study by Cox et al. (1987) and state that they are using a cut-off score of 12 or more—rather than the correct one of 13 or more—this small difference can have major ramifications when reporting rates of “possible depression”, and further analyses such as those exploring risk factors. Matthey et al. (2006) showed that such a small difference would have resulted in an increase in the prevalence of high scorers in their studies by about 33%.

Multiple cut-off scores: gender, culture, timing and diagnosis

A recent paper by Kozinszky and Dudas (2015) highlights that there are at least eight different cut-off scores that have been validated for screening for major depression in women across different cultures and at different trimesters in pregnancy, and this number would likely increase once more validation studies—which they recommend—are done in more cultures. For example, validated antenatal cut-off scores for major depression include the following: 6 or more (Malawi: Stewart et al. 2013), 9 or more (Hungarian: Toreki et al. 2013), 10 or more (Chinese: Wang et al. 2009), 11 or more (Dutch: Bergink et al. 2011), 12 or more (Nigerian: Adewuya et al. 2006), 13 or more (Taiwanese: Su et al. 2007) and 15 or more (English: Murray and Cox 1990). In addition, three studies have demonstrated that different trimesters have different validated cut-off scores (Bergink et al. 2011; Bunevicius et al. 2009; Su et al. 2007). Kozinsky and Dudas thus concluded that “it is not advisable to use universal cut-off scores (on the EPDS), as there can be cultural differences...” (p. 101).

For every woman screened in pregnancy, therefore, consideration must be given to her culture, and which trimester she is in, when deciding if she has scored “high” on the scale or not (and thus whether or not to refer for further assessment, or whether to include her in the rate of those possibly “depressed”). Postnatally, the issue is the same, with Kozinszky and Dudas (2015) showing how studies conducted in nine countries have found the optimum postnatal cut-off score to be different from the antenatal one. The importance of using the correct cut-off score for women depending upon their culture has also recently been stressed by Norhayati et al. (2015).

While comparatively little work has been done with men, here too, there is great variation in the reported optimal cut-off scores. For depression (major or minor), they include scores of 5 or more (Tran et al. 2012: Vietnam), 9 or more (Massoudi et al. 2013: Sweden), 10 or more (Matthey et al. 2001: Australia), 11 or more (Edmondson et al. 2010: England; Lai et al. 2010: Hong Kong;), 12 or more (Massoudi et al. 2013: Sweden) and 13 or more (Loscalzo et al. 2015: Italy). For depression or anxiety, they include 5 or more (Matthey et al. 2001: Australia) and 9 or more (Edmondson et al. 2010: England). Occasionally, researchers studying perinatal depression in men report that they have simply used the same validated cut-off score that applies to women (even though this is not the validated score for men) in order to allow for a comparison of rates between the two genders (e.g. Ramchandani et al. 2005). This confuses the field even more and seems to negate the empirical evidence that different genders have different validated cut-off scores (Matthey et al. 2001), possibly because they express their emotions differently or demonstrate the same emotions through different symptoms (Brownhill et al. 2005; Melrose 2010).

Within a screening context, therefore, it is extremely unlikely that the appropriate cut-off score for a client will be applied—the permutations, based upon gender, timing (antenatal: three trimesters; postnatal), culture (multiple) and diagnosis (major or minor depression, anxiety, or a combination of these) are just too many to make it practical. Antenatally, there could be up to 20 or 30 different “validated” cut-off scores that a service would need to apply to women, and a similar situation would apply postnatally. Certainly within Australia routine psychosocial screening of women does not take into account these permutations (e.g. beyondblue 2011). Instead, the pragmatic approach is often taken to use one cut-off score for all women, regardless of their cultural background or perinatal stage. Clearly such an approach is contrary to the notion that optimal cut-off scores need to be determined to ensure the scale is used appropriately for screening purposes. We would suggest that this pragmatic approach is contrary to the practice of “evidence-based (medicine)” and would better fit the term “convenience-based healthcare”, which thus knowingly results in the misclassification of women with and without significant levels of distress.

Validated against a questionable gold standard

The EPDS has been validated against the “gold standard” of the Diagnostic and Statistical Mental Disorders’ (DSM; American Psychiatric Association 1995, 2013) diagnoses of major or minor depression by many investigators (e.g. Boyce et al. 1993; Cox et al. 1987; Matthey et al. 2001). But are DSM criteria valid for the EPDS, and also for the perinatal population, not just for women, but also for men, and across all cultures?

  1. a)

    As stated earlier, part of the strength of the EPDS is that it does not include physical symptoms that could easily be due to the postpartum (or pregnancy) period. The one question that does enquire about a physical symptom (sleep) is qualified to ensure that it is due to the woman’s mood (item 7: “I have been so unhappy that I have had difficulty sleeping”). This removal of “somatic” questions is often given as its strength over other self-report depression measures, such as the Beck Depression Inventory (Beck et al. 1961), which includes questions such as “I get tired more easily that I used to”, which is likely to apply to all pregnant and new mothers.

But what about the DSM symptoms for depression? These include the physical symptoms of alteration in weight or appetite, difficulty with sleep, psychomotor agitation or retardation and fatigue or loss of energy, as well as the cognitive symptom of difficulty with concentration that could be due to depressed mood or lack of sleep.

Thus, we are saying that a strength of the EPDS is that it does not include physical symptoms that could be due to the usual concomitants of pregnancy or postpartum, yet we then validate it against a gold standard that does include such physical symptoms. This would seem to be a somewhat illogical and poorly thought-out methodology—of which the first author has also been “guilty” of in his previous validation research.

Indeed, the endorsement of these DSM physical symptoms has been found to be attributable in many pregnant women, not to their mood, but due to the normal concomitants of being pregnant (Matthey and Ross-Hamid 2011). This was found to reduce the rate of major depression in that study by 75% (from 6.8 to 1.7%) and would indicate that validating the EPDS for the optimal cut-off score in fact needs to be done only for those women who consider their symptoms are due to worry or concern, thereby matching the EPDS with mood-related symptoms of DSM.

  1. b)

    Of consideration also is that there has been an argument in the literature that men and women may exhibit depression via different symptoms (e.g. Wilhelm 2009). Thus, men are thought to be more susceptible to symptoms of risk taking behaviour, or moods of anger or irritation when depressed (Martin et al. 2013). Yet the DSM uses the same set of symptoms for women and men. And the EPDS, by its nature, is the same for these two genders and does not include those symptoms thought to occur in men, so may be less suited to detecting possible mood difficulties in them.

  2. c)

    We have a similar concern with the use of the same DSM diagnostic criteria and symptoms for all cultures, despite the known variation in symptom presentation for depressed mood across people from different ethnic backgrounds, as well as questions over the suitability of applying western diagnostic criteria (DSM) to people from non-western backgrounds (e.g. Halbreich 2007; Halbreich et al. 2007).

Screening: just for “possible depression”?

The EPDS in theory only screens for possible depression and not for other mood difficulties (Cox and Holden 2003). Yet within a screening context there are many researchers who are now arguing that we should also be screening for anxiety (e.g. Jomeen 2004; Miller et al. 2006). In a screening context, it would be expected that health professionals wish to detect women who are experiencing any negative emotion at a clinical level, not just those with possible depression. Anxiety, stress or difficulty coping would also want to be detected.

While there is some evidence that the EPDS may detect some women with anxiety, given that it has three items that load on an anxiety subscale (EDS-3A: Matthey et al. 2013a; Phillips et al. 2009), it may not be the best measure to have this wider screening capability. As previously explained, the anxiety questions are actually designed to exclude many respondents with high anxiety, due to their qualifiers of only including anxiety if it is “for no (very) good reason”, or if the individual has only blamed herself “unnecessarily”. This in part may be a reason why the EDS-3A was found to miss between 50 and 74% of women who scored high on other pregnancy-specific anxiety measures (Matthey et al. 2013c).

If, however, the EDS-3A is used to screen for possible anxiety, as is recommended by some (e.g. Swalm et al. 2010; Phillips et al. 2009), then the same cut-off score issue discussed for the full EPDS exists for this subscale. That is, what are the valid optimal cut-off scores for the different cultures and genders, both antenatally and postnatally? The numbers of permutations that will then exist with the appropriate cut-off scores for the total score and the anxiety subscale (and possibly the depression subscale) become even greater. And the complexity increases even more if different cultures have different items loading on the “anxiety” or depression subscales (e.g. French women; Adouard et al. 2005).

Transient vs enduring distress

Work by several researchers has shown that services should be cautious in not overpathologising women who have an initial high score on the EPDS. Around half of women scoring high (antenatally or postnatally) have been found to only have transient distress when re-tested just a few weeks later (Wickberg and Hwang 1996; Ballestrem et al. 2005; Matthey and Ross-Hamid 2012; Matthey 2016). This finding should be factored into research studies reporting rates of high scorers as a quasi-index of the rate of possible depression, and it is encouraging to note that in some Australian services recommendations are now being made to re-test high-scoring women on the EPDS following an initial screening (NSW Department of Health 2009).

Online screening and assessment

There are many online sites that offer the EPDS scale as part of an assessment of a woman’s (or possibly a man’s) mood, either for clinicians or for members of the public (e.g. beyondblue 2016; Black Dog Institute 2016; Perinatology.com 2016). Unfortunately, all of the above weaknesses in the EPDS are thus also evident in these online versions, except for that of inaccurate scoring. Of note is that most sites that we have seen fail to incorporate, or even discuss, any of the evidence for different cut-off scores being required for women from different cultures or for different perinatal times. Also of note is that there is no opportunity to clarify with the respondent her/his answers, nor to ensure that the automated descriptions such calculators provide (e.g. “the likelihood of depression is high”) do not get misinterpreted.

Strategies to overcome identified limitations

Scoring problems

To try and improve scoring accuracy of paper-administered EPDS forms within a clinical setting, the first author constructed an acetate scoring template for the clinicians. This, however, was quickly discarded by them and thus was an ineffective strategy to overcome this problem.

The use automated scoring procedures (e.g. through on-line administration of the scale, or the use of tablet-like devices in clinical settings) would naturally eliminate scoring errors, though may not be acceptable to either clinicians or consumers with respect to rapport building. Naturally, it is important that cut-off classifications used in such online reports are correct, though unfortunately, mistakes are being made in some EPDS online versions (e.g. Perinatology.com. 2016) in the same way that they are made in reports using paper versions of the scale (see Matthey et al. 2006). In addition, the use of such equipment is not likely to happen for many clinical services, nor for many research studies, for a long time yet. Another strategy to improve the accuracy of clinician’s scoring could be to print the score next to each response option for every question. However, we (SM and a colleague) have had the experience (unpublished) of surveying clients of a perinatal health service as to their views as to such a format, and some said that they would not respond honestly to the questions if they could see their score mounting up. This could be a problem with some online EPDS calculators, where it can be easy for a respondent to know their score as they endorse various response options (e.g. Perinatology.com 2016). We believe that research would thus need to be done on the impact of including the scores for each response option, for both print and online versions of the scale, before services adopt such a strategy.

Question ambiguity/distress qualifiers

Problems in the first area may be reduced if trained clinicians have the opportunity to skilfully discuss a woman’s responses to the EDS. Unfortunately, within busy clinical settings, time is usually limited, and thus this may not be possible, and clearly is not possible with online mood assessments. The only solution that we can think of to the distress qualifier issue would be to alter the item wordings, and doing so would then require complete re-validation of the new scale.

Transient vs enduring distress problem

Repeat testing of “screen positive” women would reduce the incorrect interpretation being made from a single high score. As noted in the relevant section, the NSW Department of Health is recommending this. In addition, Matthey (2016) has discussed how asking screen positive women how they think they may be feeling in a few weeks time may help in deciding if a woman is likely to have ongoing distress.

Multiple cut-off scores and screening just for depression

For these problem areas, we believe that a measure needs to include screening for a variety of negative emotions, without being a checklist of specific symptoms, and should not have a “continuous scoring format”. As discussed below, the first author has developed a measure taking into account all of these identified difficulties.

Alternative screening measures

Given the above concerns with the EPDS, a search of the literature was undertaken for alternative screening measures. In nearly all cases, however, such measures focus on just one mood difficulty (e.g. just depression or just anxiety: Austin et al. 2010; Brodey et al. 2016; Martini et al. 2010; Segre et al. 2006; Somerville et al. 2015; Spitzer et al. 1999, 2006; Whooley et al. 1997) and thus are not suitable for screening for a wide range of emotional difficulties, unless a second measure is also used. The use of more than one measure, however, may not be practical in many health settings due to time constraints and possible difficulties regarding different scoring procedures and threshold criteria for each instrument. The same issues exist for scales that do measure a variety of different emotions (e.g. Lovibond and Lovibond 1995). In addition, those using continuous scores (e.g. Brodey et al. 2016; Lovibond and Lovibond 1995; Somerville et al. 2015) may also suffer from needing different cut-off scores for women from different cultures and at different perinatal times.

Only one short measure was found that did encompass a variety of moods and that did not have a continuous scoring format, though its wording did not confine the mood state to how the woman was currently feeling (Goodman and Tyer-Viola 2010). Thus, the first author has developed a generic distress measure (Matthey Generic Mood Question (“MGMQ”): Matthey et al. 2013c), designed to overcome many of the difficulties with the EPDS and other measures. It consists of two core questions—one asking about the whether the respondent has felt stressed, anxious or unhappy, or found it difficult to cope, over the past 2 weeks, and the second then asking how much these feelings have bothered him/her. Two additional questions ask about the reasons for such feelings and whether the respondent would like to talk to a health professional. Apart from the initial study (Matthey et al. 2013c), which showed that this measure performed meaningfully better than various general or pregnancy-specific anxiety measures, as well as DSM diagnoses, ongoing studies are indicating that it also performs significantly better than the established depression screening scales of the EPDS (Cox et al. 1987) and PHQ-2/Whooley (Spitzer et al. 1999; Whooley et al. 1997).

Conclusions

The EPDS has been used extensively within perinatal screening contexts for women and is increasingly being used for men. Users should thus be aware of the following possible limitations with this measure:

  1. i)

    It has ambiguous questions.

  2. ii)

    It excludes some respondents with high levels of distress due to question qualifiers.

  3. iii)

    The instrument is difficult to score, with up to a third of EDS forms being found to have scoring errors.

  4. iv)

    It has questionable clinical utility in that around half of “high-scoring” women on the EPDS do not have the condition it has been validated for.

  5. v)

    There is frequent use of incorrect cut-off scores due to confusion by scale users.

  6. vi)

    There are a plethora of validated cut-off scores across genders, timing, culture and diagnoses—it is unlikely that all of these can be successfully incorporated within a clinical screening context.

  7. vii)

    The gold standard (DSM diagnoses) against which the scale has been validated uses somatic symptoms, yet the strength of the EDS is that it excludes such symptoms which can be just a normal part of being pregnant or postpartum. In addition, there are questions over the suitability of DSM as a gold standard for validating the scale for men and also for women from non-western cultures.

  8. viii)

    It does not appear to be particularly good at detecting anxiety in pregnancy (which it was not designed to do, but some investigators have shown it does do to a certain extent), nor does it detect the wider range of depression symptoms that may be exhibited by men.

  9. ix)

    Around half of high scorers on the scale only have transient distress.

Limitations (i), (ii), (iii) and (viii) are inherent in the scale’s wording, or method of scoring, and cannot be rectified without altering the scale substantially.

Limitations (v), (vi) and (vii) are likely to apply to all mood self-report questionnaires that enquire about specific symptoms, and also that have a continuous scoring format rather than a categorical emotional difficulty format (e.g. “presence of a significant emotional difficulty: yes/no”). In addition, we believe that a more critical analysis of what constitutes an appropriate gold standard for screening instruments is required.

Limitation (iv) may in part be a result of the weakness of the gold standard used for perinatal validation studies, but may also be a specific limitation of the instrument itself.

Limitation (ix) is likely to apply to all mood measures, as there are frequent stressors in the perinatal period that by definition will for many women and men be transient (e.g. antenatally: concern over test results until they are known; morning sickness; adjusting mentally to being pregnant; postnatally: infant sleep or feeding difficulties; adjusting to becoming a parent).

While there may be some strategies to overcome some of these limitations when using the EDS, most are unlikely to be practical or particularly successful within screening settings. It is hoped that a new measure, the MGMQ (Mattheyet al. 2013c), designed to overcome these weaknesses, will soon have sufficient empirical evidence to enable clinical services to have a viable alternative screening tool.

We wish to reiterate that the EDS has been an excellent tool that has served the field admirably and has helped in the cause of understanding perinatal mood difficulties in women over the past 30 years. The purpose of this paper, however, is simply to raise some possible limitations that we believe exist in the scale to enable researchers and clinicians to think critically about these issues. As Alderdice et al. (2013) concluded: “we… run the risk of using a measure because it has a high profile rather than necessarily being the best measure of psychological health” (p. 436).