Introduction

Depressed mood during pregnancy is highly prevalent, especially in the context of poverty, intimate partner violence, and other significant life stressors (Fisher et al. 2012; Hung et al. 2012; Tsai 2013). In South Africa, nearly one-half of women in township settings have been diagnosed with major depressive disorder during the antenatal and postnatal periods (Cooper et al. 1999; Dewing et al. 2013; Rochat et al. 2011; Tsai and Tomlinson 2012). The public health impacts of major depressive disorder are compounded by its intergenerational effects: newborn children of mothers with depression have poorer health and socioemotional development (Rahman et al. 2004; Straub et al. 2012; Weissman et al. 2006), and these in turn have adverse implications for their long-term psychosocial, cognitive, and economic outcomes (Heckman 2007; Pearson et al. 2013; Venkataramani 2012). Yet, the importance of major depressive disorder as a significant public health issue is not matched by appropriate resource allocation in many sub-Saharan African countries (Saxena et al. 2007; Tomlinson and Lund 2012).

Depressive disorders are often underdiagnosed by providers in general (Evins et al. 2000; Fergerson et al. 2002; Lyell et al. 2012), and in South Africa, much of postnatal care is focused primarily on the health of the newborn (with little attention paid to maternal well-being following delivery) (Beksinska et al. 2006). Because perinatal depression is treatable (Sockol et al. 2011), its underdiagnosis represents a major missed public health opportunity. Moreover, several high-quality randomized controlled trials conducted in resource-limited settings have shown that psychological treatments can be effectively administered by nonspecialist, lay health workers to address depressed mood among women during the perinatal period (Baker-Henningham et al. 2005; Clarke et al. 2013; Cooper et al. 2009; Rahman et al. 2008). Accurate screening for, and subsequent diagnosis of, perinatal depression will be an essential component of any strategy to funnel patients to appropriate treatments at scale. Short and ultrashort screening instruments—defined as having 5–14 items and <5 items, respectively (Mitchell and Coyne 2007)—for perinatal depression have been studied and validated in several sub-Saharan African countries (Tsai et al. 2013). Given their brevity, it is possible that these screening instruments may be efficiently incorporated into the routine workflow of overburdened health care workers.

Mobile phone-based technologies, also referred to collectively as “mHealth,” have been proposed as a potential expedient for leveraging limited human resources for health (Derenzi et al. 2011; van Heerden et al. 2012). With few exceptions, however, the evidence base supporting their efficacy and effectiveness has lagged behind the overall enthusiasm for their implementation and scale-up (Tomlinson et al. 2013c). To address these gaps in the literature, we analyzed data from two studies independently conducted in Khayelitsha, South Africa to determine the reliability and criterion-related validity of four short and ultrashort versions of the 10-item Edinburgh Postnatal Depression Scale (EPDS-10) (Cox et al. 1987) in case finding for antenatal depression among high-risk women. In the second, smaller study, community health workers with no training in human subjects research were trained to administer the EPDS programmed into mobile phones during the routine course of their community-based outreach. We then compared the findings from the two studies to determine whether the data collected by the community health workers were comparable to the data collected by research assistants who had formal human subjects training.

Materials and methods

Study population

The two studies were conducted in different areas of Khayelitsha, a socioeconomically deprived informal settlement on the outskirts of Cape Town, South Africa (Nleya and Thompson 2009). Khayelitsha is one of the largest such settlements in South Africa, with a population exceeding one million residents (Brunn and Wilson 2013) who are primarily Xhosa-speaking, black Africans living in informal housing (i.e., shacks) on unserviced land. Unemployment, food insecurity, and subsistence-level poverty rates are extremely high (BeLue et al. 2008; Cooper et al. 1991; Dewing et al. 2013; Muzigaba and Puoane 2011; Pick and Obermeyer 1996; Venkataramani et al. 2010). Khayelitsha leads all subdistricts of Cape Town in age-standardized mortality, with the leading causes being HIV/AIDS, homicide, and tuberculosis (Groenewald et al. 2010).

Study 1

These data were collected from May 13, 2009 to September 29, 2010 in 24 noncontiguous neighborhoods of Khayelitsha and represented the baseline sample of pregnant women older than 18 years of age who were participating in a cluster-randomized trial. The trial was designed to test the effectiveness of an enhanced version of a child health and nutrition program operated by the Philani Child Health and Nutrition Project (le Roux et al. 2013; Rotheram-Borus et al. 2011), a community-based nongovernmental organization that has been based in Khayelitsha for more than three decades (Austin and Mbewu 2009; le Roux and le Roux 1991). As part of Philani’s standard operating procedures, its community health workers conduct routine community-based outreach in 100 neighborhoods, identifying all pregnant women and inviting them to take part in their ongoing child health and nutrition program. For the cluster-randomized trial, investigators identified 40 neighborhoods in which Philani did not already have operations in place and then selected 12 matched pairs based on prespecified neighborhood characteristics such as the presence of alcohol bars and type of housing predominant in the area. Local women recruited study participants from within these neighborhoods by going door-to-door to identify pregnant women. All participants provided written informed consent. Ethical approval for study procedures was granted by the Office of the Human Research Protection Program, University of California at Los Angeles; and the Health Research Ethics Committee, Faculty of Health Sciences, Stellenbosch University.

Study 2

From May 1, 2010 through February 18, 2011, we recruited consecutive pregnant women older than 18 years of age and who were able to provide informed consent, from seven neighborhoods of Khayelitsha: Enkanini, Greenpoint, Kuyasa, Makhaza, Site B, Site C, and Town Two. These neighborhoods did not overlap geographically with the neighborhoods where study 1 was conducted: the closest two neighborhoods of studies 1 and 2 are located approximately 2 km apart, while the farthest two neighborhoods of studies 1 and 2 are approximately 8 km apart. To be eligible, women had to be initiating enrollment in the Philani program. At the time of their enrollment into the Philani program, we invited women to also participate in our perinatal depression screening study. All participants provided written informed consent. Ethical approval for study procedures was granted by the Health Research Ethics Committee, Faculty of Health Sciences, Stellenbosch University; the Committee on Human Research, University of California at San Francisco; and the Office of Human Research Administration, Harvard School of Public Health.

Measures

In both studies, the Xhosa version of the EPDS-10 was administered using survey software programmed into a mobile phone (Tomlinson et al. 2009b; Tomlinson et al. 2013b). Among Xhosa-speaking women, several studies have supported the construct validity, criterion-related validity, and factor structure of the EPDS-10 (De Bruin et al. 2004; Hartley et al. 2011; Lawrie et al. 1998), and it has also been shown to have a high sensitivity for detecting postnatal depression in numerous other settings worldwide (Tsai et al. 2013). Similar to Kabir et al. (2008) and Rochat et al. (2013), we defined four short and ultrashort versions of the EPDS-10: the 2-item analogue of the Patient Health Questionnaire (EPDS-2) (Lowe et al. 2005; Monahan et al. 2009), the 3-item anxiety subscale (EPDS-3), an abbreviated 5-item version of the depressive symptoms subscale (EPDS-5) (Rochat et al. 2013), and the 7-item depressive symptoms subscale (EPDS-7) (Box 1). Following previous studies in this population, we employed a cutoff score of EPDS-10 ≥ 13 to define the criterion standard of probable antenatal depression (Honikman et al. 2012; Rochat et al. 2006; Tomlinson et al. 2013a).

In study 1, the EPDS was administered by trained research assistants who were hired independently of Philani. The training curriculum included ethics and human subjects training, mock structured interviews with real-time feedback, and certification based on observed interviews conducted away from the study site (Tomlinson et al. 2013a). In study 2, the Philani community health workers were trained to use the mobile phones for administering the EPDS during the routine course of their community-based outreach and wellness work. They underwent a detailed training curriculum to learn how to implement the Philani intervention package (le Roux et al. 2010; le Roux et al. 2011) but did not receive any other training in data collection or other aspects of human subject research.

Statistical analysis

To estimate the internal consistency of the EPDS-10 and the short and ultrashort subscales, we calculated the Cronbach’s α coefficient and its one-sided 95 % confidence interval (Feldt 1965; Kristof 1963). The Pearson correlation coefficient was used to estimate the correlations between the short and ultrashort subscales and the full EPDS-10. Using the conventional screening cutoff score of ≥10 for the 3-, 5-, and 7-item subscales (Cox et al. 1987; Kabir et al. 2008) and a cutoff score of ≥3 for the 2-item subscale (Kabir et al. 2008), we compared their operating characteristics with the criterion standard of probable antenatal depression, calculating sensitivity, specificity, and likelihood ratios using standard formulas. We then generated the receiving operating characteristic (ROC) curves (Hanley and McNeil 1982), calculating the area under the ROC curve (AUC) using the trapezoidal rule and comparing AUC values using the algorithm suggested by DeLong et al. (1988). All analyses were conducted using the Stata statistical software package (version 12.1, College Station, Tex.).

Results

Study 1 consisted of 1,144 women, while study 2 consisted of 361 women. Participants in both studies had a similar sociodemographic profile (Table 1). The mean age was 26 years, and about one-third of participants were married. About one-half of participants lived in informal housing and did not have access to a flush toilet. Few participants were employed. In study 1, the mean age of gestation was 26 weeks (range, 3–40 weeks). Gestational ages were not obtained in study 2.

Table 1 Summary statistics

In the research data collected by trained research assistants (study 1), 475 (42 %) women were assessed as having probable antenatal depression. Pearson coefficients for the correlation between the short and ultrashort subscales and the full EPDS-10 ranged from 0.85 to 0.98. The EPDS-2 showed poor internal consistency, but the Cronbach’s α values for the other subscales ranged from 0.72 to 0.83 (Table 2). All of the subscales showed excellent discrimination. The EPDS-7 had the highest AUC value (0.99) compared to the other short and ultrashort subscales (AUC values ranged from 0.92 to 0.97; P values for tests of equality all <0.001) (Fig. 1). At the conventional screening cutoff score of ≥10, the EPDS-7 had a sensitivity of 0.99 and specificity of 0.85 for detecting probable antenatal depression. The EPDS-7 at ≥10 had the highest sensitivity of the four shortened subscales, while the EPDS-3 at ≥10 had the lowest specificity.

Table 2 Sensitivity and specificity of antenatal screening for detecting probable antenatal depression in study 1 (N = 1,144)
Fig. 1
figure 1

Sensitivity and specificity of four short and ultrashort versions of the Edinburgh Postnatal Depression Scale for detecting probable antenatal depression (study 1, N = 1,144)

In the community health worker-administered survey (study 2), 165 (46 %) women were assessed as having probable antenatal depression. The estimated correlations (between the short and ultrashort subscales and the full EPDS-10), internal consistency values, and operating characteristics obtained in study 2 (Table 3) were qualitatively similar to those estimated in study 1. All subscales showed excellent discrimination, with the AUC value highest for the EPDS-7. As in study 2, the EPDS-7 at ≥10 had the highest sensitivity of the four shortened subscales, while the EPDS-3 at ≥10 had the lowest specificity.

Table 3 Sensitivity and specificity of antenatal screening for detecting probable antenatal depression in study 2 (N = 361)

Discussion and conclusion

In two independently conducted cross-sectional studies of 1,505 pregnant, Xhosa-speaking women living in a peri-urban area near Cape Town, South Africa, we demonstrated that the prevalence of probable antenatal depression was exceedingly high, matching the rates estimated in other studies conducted throughout South Africa (Tsai and Tomlinson 2012). We also found that four short and ultrashort versions of the EPDS had excellent discrimination for detecting probable antenatal depression. Irrespective of whether the data were collected by research assistants or by community health workers, the sociodemographic profiles and estimates of reliability and validity were qualitatively similar across the two studies. Our findings have important programmatic implications for mHealth interventions and leveraging existing human resources to improve maternal mental health and child health in resource-limited settings.

Our primary finding that all four short and ultrashort versions of the EPDS had excellent discrimination shows some similarities and differences with previously published studies. In the first of these studies, which was conducted among young women in the USA, the EPDS-3 had 0.95 sensitivity and 0.80 specificity for detecting probable postnatal depression, while the EPDS-7 had 0.59 sensitivity and 1.0 specificity (Kabir et al. 2008). Rochat et al. (2013) used a structured clinical interview to determine the reference criterion of a major depressive episode among 109 young Zulu-speaking South African women (Rochat et al. 2011). In their ROC analysis, the 3-, 5-, and 7-item versions of the EPDS performed similarly, with sensitivities ranging between 0.61 and 0.65 and specificities ranging from 0.86 to 0.90. Our study found that all four short and ultrashort versions of the EPDS had excellent discrimination, but as with Kabir et al. (2008), our reference criterion was not ascertained using a structured clinical interview.

A second extension of our study compared to prior work is that we demonstrated the feasibility of using community health workers, who had no previous research training, to conduct case finding for antenatal depression using short and ultrashort screening instruments programmed into mobile phones. A side-by-side comparison of data from our two independently conducted studies suggests no substantive differences in the estimated operating characteristics of the screening instruments. This is an important but heretofore unresolved question of public health significance given the increasing emphasis placed on task shifting (to nonspecialist, lay health workers) in the global mental health agenda (Becker and Kleinman 2013; Kagee et al. 2013; Ngo et al. 2013; Rahman et al. 2013; Tomlinson et al. 2009a). Community health workers are often burdened by heavy workloads, and this contributes to high turnover and low job satisfaction (Alamo et al. 2012; Jaskiewicz and Tulenko 2012). If they are to take on the additional tasks of screening and referral, the burden of these additional duties should be minimized.

Although our analysis suggests that incorporating community-based screening into the overall workflow of a stepped care program (e.g., such as that described by Honikman et al. (2012)) could potentially be feasible, two substantive limitations suggest that broad application of our findings would be premature. First, as noted above, similar to the study by Kabir et al. (2008), our two studies employed a reference criterion of probable antenatal depression. The high prevalence of probable antenatal depression documented in our study should be interrogated in future studies using the gold standard reference criterion of major depressive disorder determined by structured clinical interview. If all cases of probable antenatal depression were to be referred for subsequent diagnosis and treatment, the number of false positives could easily overwhelm the country’s capacity for outpatient mental health care delivery (Kagee et al. 2013; Lund et al. 2012; Petersen and Lund 2011). However, it is also notable that, although determinations based on the conventional EPDS-10 threshold of ≥13 are commonly assumed to result in over-diagnosis, this has actually not been found to be the case in IsiZulu-speaking South African women. In a study conducted among pregnant women in rural KwaZulu-Natal, the prevalence of major depressive disorder as determined by structured clinical interview (47 %) closely matched the prevalence of probable depression as determined using the EPDS-10 (44 %) (Rochat et al. 2011; Rochat et al. 2013).

A second potential limitation is that our findings apply solely to depression during the antenatal period. On average, participants were recruited into study 1 during the early third trimester. Gestational ages were not obtained in study 2, but given the similar sociodemographic characteristics of the samples, we expect that these participants would have had a similar gestational profile as that of the participants in study 1. Several US and European studies have found that a large proportion of women present with elevated symptoms of depression during pregnancy but experience substantial attenuation of symptoms after delivery (Christensen et al. 2011; Mora et al. 2009; Schmidt et al. 2006; Seto et al. 2005; Sutter-Dallay et al. 2012). If a similar phenomenon were observed in our cohort, this could potentially reduce the utility of antenatal screening for postnatal depression. Numerous studies of antenatal (or immediate-postnatal) screening for postnatal depression have been conducted in the USA and Europe, but in general, the instruments employed have been found to have relatively poor predictive power (Austin and Lumley 2003). With the exception of one cohort study from South Africa, we are unaware of any other such studies that have been conducted in resource-limited settings (Hung et al. 2014). This is a large gap in the literature given the frequent losses to follow-up from care during the first few postnatal weeks and the generally limited attention paid to the mother’s well-being following delivery (Beksinska et al. 2006).

In summary, we found that short and ultrashort Xhosa versions of the EPDS have excellent discrimination for detecting probable antenatal depression among women in Khayelitsha, South Africa, where the prevalence of antenatal depression is exceedingly high. The estimated operating characteristics based on data collected by community health workers using mobile phones during routine antenatal wellness care were comparable to those based on data collected by trained research assistants. Community-based depression screening in South Africa could potentially improve the emotional wellbeing of high-risk women and the health of their children, but before it is implemented widely, our findings should be replicated using a more robust reference criterion.