Introduction

Expanding access to antiretroviral therapy (ART) for HIV-infected pregnant and postpartum women living in low- and middle-income countries (LMICs) has improved maternal health and reduced the risk of mother-to-child transmission (MTCT) [13]. These improvements rely on optimal adherence to ART both during pregnancy and following delivery. Following successful initiation of treatment, the next challenge is successful implementation of the ART regimen for as long as a woman remains on lifelong ART [4]. Treatment implementation and persistence among pregnant and postpartum women is often suboptimal [57], and measurement of ART adherence in LMICs presents a significant challenge.

Long a standard tool for monitoring HIV treatment in high-income countries, viral load (VL) testing is being promulgated by the World Health Organization (WHO) as an important tool for HIV patient management in LMICs [2]. Although VL may be influenced by numerous factors, including drug resistance, poor ART adherence is the main cause of non-suppressed VL globally [811] and identification of patients with adherence difficulties is a major goal of VL testing. Viral load is indispensable for determining treatment failure and is potentially a valuable tool to reinforce adherence, however it is expensive, measurement is often infrequent, and it also does not directly capture medication taking behaviour. Because of this, both clinicians and researchers are interested in the development of measures which can be conducted frequently with limited resources, which adequately capture patient behaviour, and which can potentially identify adherence problems prior to the development of viremia [2, 1214].

There are trade-offs for virtually every approach to measuring medication adherence [15, 16]. Objective adherence measures, such as electronic drug monitoring (EDM), are often significantly associated with VL, but are generally not feasible in low resource settings [1720]. EDM is also prone to measurement bias. Although objective, EDM does not directly measure medication taking and may overestimate adherence if pills are pocketed. Pill counts are inexpensive but require time from trained staff and are prone to pill dumping. Pharmacy refill has shown potential but is retrospective and still relies on the assumption that patients ingest the medication they have collected. Self-report is also prone to social desirability, recall bias and question misinterpretation, as well as ceiling effects, where large fractions of a population score at the top of a scale [9, 16]. However, self-reported adherence measures are simple and inexpensive to administer, and are often the only practical method available for routine adherence monitoring in low resource settings [9, 21]. Self-reported adherence has at times been shown to correlate with VL and objective adherence measures, with the added advantage of allowing for the immediate discussion between patient and provider of reasons for adherence problems and potential solutions [21].

There are a multitude of self-reported measures available both in research and in routine care, and there has been ongoing work to improve the validity, reliability and practicality of these measures [9, 16, 17, 2125]. One concern with self-report is that the adherence questions asked may be understood differently by different people and that varying recall periods and types of questions illicit responses of varying accuracy [22, 23]. Through a rigorous process of cognitive interviewing, Wilson et al. found that many of the phrases commonly used in self-reported adherence tools were not consistently understood across a diverse cohort in the United States (US) [26]. Following interviews and a larger field test, they selected the three best performing and consistently understood items to form a simple three-item adherence screening tool [26]. The tool has been validated in the US and shows potential to be a first-stage adherence screener to prompt discussion of adherence problems and flag individuals requiring more resource-intensive second-stage screening, adherence counselling and intervention as appropriate [27]. These items have not yet been tested outside the US [26, 28].

There is an urgent need for low-cost, sensitive and easy to administer adherence screening tools that have been tested in diverse contexts, including among pregnant and postpartum women for whom non-adherence has consequences for both individual health and transmission risk [14, 22, 23, 29, 30]. To fill this gap, we assessed the above mentioned, short self-report adherence measure in a cohort of pregnant and postpartum women in South Africa who had persisted on treatment up to the time of assessment. We translated the adherence items into the predominant local language, isiXhosa, and aimed to investigate the performance of the scale as a screening tool to identify suboptimal adherence, using VL as the reference standard. A secondary aim was to assess differences in reported adherence across sociodemographic subgroups, including psychosocial risk groups.

Methods

We conducted a cross-sectional analysis of self-reported adherence and VL using data from a larger multi-phase implementation science study which aims to optimize ART services for maternal and child health (MCH-ART study, ClinicalTrial.gov NCT01933477). The study took place at a large public sector primary care facility in Cape Town, South Africa. The surrounding community is characterised by high levels of poverty and unemployment, and high HIV prevalence (33 % of antenatal clinic attenders were HIV-infected in 2013). ART services have been provided at this facility since 2004 and all care is provided free of charge.

Participants

Between April 2013 and June 2014, consecutive HIV-infected women, 18 years and older, booking for antenatal care (ANC) and eligible to start ART were approached to enrol. ART eligibility was based on CD4 ≤350 cells/µL and clinical staging until July 2013, when universal ART for all HIV-infected pregnant women was introduced [31]. ART initiation and follow-up is routinely provided by nurse-midwives working within the antenatal clinic. As per local prevention of mother-to-child-transmission (PMTCT) guidelines, all ART-eligible women initiate a once daily fixed-dose combination of efavirenz (EFV), emtricitabine (FTC) or lamivudine (3TC), and tenofovir (TDF). Dispensing is monthly for the first 4 months on treatment and every 1–2 monthly thereafter. Adherence is assessed at routine ART consultations by either self-report of missed doses or pill counts, with adherence support and counselling provided by trained lay counsellors to those who need it.

Of 658 eligible women, 628 were enrolled in the MCH-ART study (three women refused participation and 27 were not successfully enrolled prior to delivery due to advanced gestation at ANC booking). All women provided written informed consent prior to participation. Depending on their gestation at ANC booking, 2, 22 and 76 % of women completed two, three and four study measurement visits, respectively, between ART initiation and 6 weeks post-delivery. All study visits occurred separately from routine ART and ANC services. This study was reviewed and approved by the Human Research Ethics Committee of the University of Cape Town as well as the Institutional Review Board of the Columbia University Medical Centre.

Procedures

Study visits including adherence assessments were scheduled (a) at 2–4 weeks after ART initiation, (b) at 34 weeks gestation, (c) within 7 days of delivery, and (d) at 6 weeks postpartum. They included interviewer-administered questionnaires and venepuncture for VL testing. We translated all interview measures into isiXhosa, the predominant local language, with back translation to confirm accuracy [32]. Questionnaires were administered by trained isiXhosa-speaking interviewers working in private rooms.

Measures

Demographic characteristics, including age, education level, gravidity, timing of HIV diagnosis and prior antiretroviral (ARV) use were collected at enrolment. A composite poverty score was compiled and used to categorise participants into tertiles based on their relative level of disadvantage. The composite score included employment status and a standardised asset index score based on housing type and household access to a flush toilet, piped water inside the home, electricity, and a refrigerator, telephone and television.

Self-reported adherence was measured using a three-item adherence scale, developed through rigorous cognitive interviewing and validated in the US, as previously reported [2628]. The three items included (1) an assessment of the number of days with missed ART doses in the preceding 30 days; (2) a scale rating of how good a job you did taking your medicines in the preceding 30 days and (3) a scale rating of how often you took your medicines the way you were supposed to in the preceding 30 days (Table 1). The items making up the tool were previously found to be consistently understood by a diverse cohort in the US where the tool had excellent internal consistency (α = 0.86) [26].

Table 1 Three-item self-reported adherence scale items

Depression was measured using the Edinburgh postnatal depression scale (EPDS), a 10-item measure of recent depressive symptoms validated for use both in antenatal and postnatal women. We used a threshold value of ≥13 for possible major depression as described in the original scale development [33].

Alcohol use in the 12 months prior to pregnancy was measured using the Alcohol Use Disorders Identification Test (AUDIT), one of the most widely used measures of risky alcohol use. The full 10-item tool was developed by the WHO to identify people with hazardous or harmful patterns of alcohol consumption. In this analysis, we have used the AUDIT-C tool, comprised of the first three items in the AUDIT, to serve as a rapid screening tool for problem drinking. We used the recommended threshold of three or above to identify hazardous drinking [34].

HIV RNA VL was measured at all study visits. We conducted venepuncture for HIV RNA VL at each study measurement visit. Five mL venous blood was drawn for testing conducted by the National Health Laboratory Services using the Abbott RealTime HIV-1 assay (Abbott Laboratories, Illinois, USA). Viral load and self-reported adherence measures always took place on the same day. All items in the adherence tool refer to the 30 days prior to the measurement visit, meaning reported adherence problems were likely to reflect in the VL measure. We included women in this analysis who had at least one study visit including VL and adherence measurement after at least 16 weeks on treatment, and who had persisted on ART from initiation up to the time of assessment. These restrictions were used to ensure that, assuming good treatment implementation, all women could be reasonably expected to have reached viral suppression at the time of assessment [31, 35].

Analyses

Data were analysed in STATA V12.0 (Stata Corporation, College Station, Texas, USA). The distribution of the three individual adherence items was described using medians with interquartile range (IQR). In addition, an aggregate scale was developed, based on a recoding of each item with equal weighting, to create a score ranging from 0 to 100, with the latter representing the best possible self-reported adherence. We assessed internal consistency using Cronbach’s alpha, and determined the association between VL and the adherence scale using logistic regression and Receiver Operating Characteristic (ROC) curve analysis. In the primary analyses, we used a VL cut point of 1000 copies/mL to indicate elevated VL based on local and international threshold for treatment failure and regimen change [2, 31]. This measure was based on a single VL, taken at the same time as the self-reported adherence assessment. Additional cut points of 50 and 10000 copies/mL were used in sensitivity analyses. We compared the area under the curve (AUC) for the three-item scale across sociodemographic and psychosocial categories in order to compare the performance of the scale across different subgroups.

Results

Patients

A total of 628 women were enrolled in the parent study. We included 452 women who had at least one study visit that included adherence and VL measures after a minimum of 16 weeks on ART. The majority of exclusions (n = 169) were as a result of having insufficient time on ART at all available study assessments. An additional 7 women were missing the required measures and were also excluded. When we compared women excluded and included, we found no differences at baseline other than a later gestation at ART initiation among women excluded, as expected with the cut-off of 16 weeks on treatment. All included women had persisted on treatment up to the time of the assessment and they are described in Table 2. The median age was 28 years, 74 % of women had not completed secondary school and 41 % were married or cohabiting. The median pre-ART VL was 10,587 copies/mL (IQR 2603–43,099 copies/mL). 26 % of women reported hazardous drinking prior to pregnancy, and 10 % of women had an EPDS score suggesting possible depression. At the time of the adherence assessment, 33 % (n = 147) of women were pregnant (median gestation 34 weeks) and the remaining 305 women had recently delivered (median time postpartum 1.4 weeks). The median duration of ART use at the time of adherence assessment was 19 weeks (IQR 18–21 weeks), and 92 % of women had VL below 1000 copies/mL.

Table 2 Description of 452 women who started ART during pregnancy and had an adherence assessment and viral load (VL) after at least 16 weeks on ART

Item and Scale Characteristics

Table 3 shows descriptive statistics for the three individual adherence items and the combined scale. The item scores were higher (better adherence) for the item assessing days on which doses were missed (mean 97.1, median 100) compared with the rating and frequency items (means (medians) 78.5 (83.3) and 82.6 (83.3), for item 2 and 3 respectively). The distributions of the individual adherence items and the scale score are displayed in Fig. 1. All histograms were left-skewed with high levels of reported adherence on all three individual items, however in the combined three-item scale, only 12 % of women (n = 55) reported the highest score in all three items and achieved a perfect score in the combined scale. The overall Cronbach’s alpha was good, at 0.79.

Table 3 Summary of item and scale characteristics
Fig. 1
figure 1

Histogram showing distribution of individual items and the combined three-item scale score

Table 4 describes the distribution of item responses across sociodemographic and psychosocial strata and by VL above or below 1000 copies/mL. Overall, variations in scale scores within subgroups were of small magnitude, and varied significantly only by education, with higher scores among women who had completed secondary school compared to those who had not (p < 0.001). Duration of ART use did not alter the scale score; however women with longer time on treatment were more likely to have a raised VL using a cut-off of both 50 and 1000 copies/mL (p < 0.001) (Table 5).

Table 4 Distribution of scale responses across participant subgroups and stratified by VL ≥1000 copies/mL
Table 5 Proportion of women virally suppressed (<50 and <1000 copies/mL) by weeks on ART at the time of sampling. (N = 452)

Relationships of Three-Item Scale to Viral Loads

Crude and adjusted associations between the three-item adherence scale and VL ≥50, and ≥1000 copies/mL are presented in Table 6. In bivariate analyses, having a raised VL was consistently associated with lower median scores on the adherence scale using a VL cut-off of both ≥50 (median adherence scores 88.9 and 83.3, p = 0.005), and ≥1000 copies/mL (median adherence scores 88.9 and 81.1, p = 0.001). These associations persisted in multivariable analyses which adjusted for age and education. The AUC for the three-item scale was 0.599, 0.656 and 0.642 using a VL cut-off of ≥50, ≥1000 and ≥10000 copies/mL, respectively (Fig. 2). The AUC using a VL cut-off of 1000 copies/mL remained above 0.6 and did not vary significantly across subgroups of sociodemographic and psychosocial characteristics (Table 4).

Table 6 Distribution of three-item adherence scale scores stratified by VL ≥50, and VL ≥1000 copies/mL respectively (N = 452)
Fig. 2
figure 2

Receiver operating characteristic curves for three-item scale detecting VL ≥50 (a) and VL ≥1000 (b)

The sensitivity and specificity of the adherence scale for predicting VL ≥50, and ≥1000 copies/mL, is presented in Table 7. Using a scale score cut-off of <80, the scale had a low sensitivity for detecting those who truly had a VL above 50 or 1000 copies/mL (36 and 45 %, respectively). Using a scale score cut-off of <90, 74 and 76 % of women with VLs ≥50 and ≥1000 copies/mL were correctly identified using the three-item adherence scale, and a cut-off of <100 identified more than 90 % of women with VLs ≥50 and ≥1000 copies/mL. All cut-off scores resulted in very high negative predictive values among women with summary scale scores above or equal to 80, 90 and 100, having a 94, 95 or 98 % chance of having a VL <1000 copies/mL, respectively.

Table 7 Sensitivity, specificity and positive and negative predictive values of three-item adherence scale predicting viral load ≥50 and ≥1000 copies/mL, using a scale cut-off score of <80, <90 and <100

Discussion

This analysis had three main findings. First, the three-item self-reported adherence scale that we tested in pregnant and postpartum women on ART in South Africa had good psychometric characteristics and did not demonstrate the ceiling effects that self-report items often show. Second, self-reports had significant associations with elevated VL in both bivariate and multivariable models. And third, using ROC curves, the scale had only moderate ability to discriminate between patients with elevated and non-elevated VL.

The scale used in our analysis consisted of three simple and easily understood adherence questions developed in English through a process of cognitive interviewing. Reviews have shown that self-reported measures range from single items asking for the number of prescribed doses missed in a specified time period to numerous complex items requiring detailed recall; very few studies report using the same self-reported adherence measure, making it difficult to compare results across studies [9, 21, 23]. An analysis in South Africa evaluating the performance of five commonly used self-reported adherence questions found that all questions were poor predictors of virologic and/or immunologic failure [36]. Similarly, another study in Cape Town using a short adherence scale in HIV-infected adults found no correlation between the scale score and having a detectable VL [37]. Both of these studies used recall periods of 7 days, much shorter than the 30 days used in our analysis. In the current analysis we found that although the effect sizes were relatively small, all individual adherence items, as well as the three-item scale, were significantly correlated with VL. The cognitive interviewing approach used in developing this scale, which resulted in a word choice aimed at minimizing social desirability as well as misinterpretation, may have resulted in improved responses to the adherence questions even in this new setting.

The distribution of each item and the three-item scale (Fig. 1) found in this study was very similar to that reported in the US population of predominantly male adults on ART, though in our cohort a lower proportion obtained maximum scores on all three items and reached a score of 100 on the combined scale [26]. Our data did not show a large ceiling effect, a common problem for self-report adherence measures [9, 16]. While ceiling effects can occur if the population is in fact highly adherent, in many cases it is rather a result of patients overestimating their adherence, typically due to social desirability effects [3739]. While we did observe a ceiling effect for the missed doses item alone (80 % reporting no missed doses), only 12 % of women reported perfect adherence in the three-item scale. This three-item scale was developed paying particular attention to a word choice that optimizes accurate reporting. This scale may therefore be more sensitive to reported non-adherence than other scales that have shown more prominent ceiling effects, although this cannot be known definitively unless scales are compared head to head in the same populations. Previous studies in pregnant and postpartum women in Latin America and in Kenya using combination adherence scores based on pill counts and self-report have reported optimal adherence in more than 80 % of women [10, 40]. This aligns with our results using missed doses alone, and also with the finding of 92 % of women with VL below 1000 copies/mL at the time of assessment, however this is much higher than the 12 % that we found using the combined scale score. Although there are few data assessing the performance of self-reported adherence measures in pregnant and postpartum women in low resource settings or in this particular context, the lack of a ceiling effect may be indicative of a more at risk population than other cohorts previously studied, and perhaps the three-item scale, which has been developed to be sensitive to reported non-adherence, is detecting more subtle difficulties with treatment implementation that have not yet impacted on VL.

Our results found that the three-item adherence scale scores were consistently significantly associated with elevated VL, however the effect sizes were relatively small. This is mirrored in the ROC analyses and suggests that although the three-item scale score is associated with VL, it may not be a very strong predictor of having an elevated VL. For the purposes of a first-stage screening tool, we are looking for a measure with high sensitivity rather than perfect predictive ability, as would be required for a diagnostic tool. The scale achieved an AUC of 0.656 to detect a VL ≥1000 copies/mL, similar to what has been previously reported for different self-report measures as well as for pharmacy refill and VL, though lower than other combined self-report questions and lower than EDM and VL [19, 4143]. A recent validation of this three-item scale in a US population found it could achieve an AUC above 0.70 with EDM and with further evaluation it shows potential to fill an important gap [27].

Global recommendations are moving towards making VL monitoring the standard of care for ART programs, however in reality there are still likely to be problems with access in low resource settings due to feasibility and cost constraints. With infrequent VL testing and potential delays in feedback of results in many low resource settings, there is still a need for interim adherence assessments, particularly in the time sensitive context of PMTCT [2]. In many settings, adherence self-reports will remain the most feasible option that allows for rapid assessment of adherence risk and immediate feedback and counselling. Our findings suggest that this simple, low-cost adherence screening tool may provide an early warning of poor adherence and prompt second-stage adherence screening or VL testing. With a cut point of <90 or <100, the combined scale was able to detect 76 and 97 % of women with VL above 1000 copies/mL, respectively. This is an important advance for first-stage self-reported adherence screening, with reported sensitivities of other tools ranging from 24 to 57 % [23]. This scale had very high negative predictive values (94–98 % depending on the scale cut-off) meaning that women with above threshold adherence scores had a very small probability of having a raised VL and could potentially be screened out of more resource intensive second-stage adherence screening.

Although this scale shows promising performance in this setting, further research is needed to determine how appropriate it will be in a routine care setting and how it could fit into local routine ART management plans. In this analysis, considering anyone scoring below the maximum score on any item as non-adherent (a combined scale score of <100), 88 % of women reported some adherence difficulty. However, at the time of assessment 92 % of women had a VL <1000 copies/mL and 360 women who had a VL below 1000 copies/mL reported some adherence difficulty. An unexplored benefit of this scale is the opportunity of the health care provider to discuss adherence challenges and solutions immediately after any difficulty with implementing treatment is reported. This finding of suboptimal reported adherence in the absence of raised VL points perhaps to more subtle early adherence difficulties being detected by this scale in a cohort of women recently initiated on ART. These women may be at increased risk of non-adherence and poor treatment outcomes over time and their early reports of adherence difficulty may provide an opportunity for additional adherence counselling, before their adherence behaviour can impact their VL. Further investigation into the longitudinal prognostic value of this scale should be considered, and use of this simple tool in routine care as a flag to prompt further assessment and decide on an appropriate adherence support intervention may warrant consideration [14, 26]. The successful use of this translated scale suggests that it can withstand cross-cultural adaptation and may also be useful in other settings.

Although, these data suggest that this three-item scale could successfully be used as a first-stage adherence screening tool, the following limitations must be noted. The scale was administered by trained research interviewers outside of the routine ART care service, reducing the risk of social desirability bias and in the absence of the time constraints normal in a busy routine ART clinic. For these reasons generalizability to routine clinical settings in high volume ART clinics is not known. In this analysis we were not able to compare the three-item scale to other objective adherence measures. We were also unable to compare this tool with adherence measures taken at routine ART follow-up visits as these data were not available. We were able to compare the three-item scale to single missed doses item alone, a common measure of adherence in routine care, however an important next step will be to evaluate how this tool could be used in routine care and comparing it to measures currently in use in low resource settings. Pharmacy dispensing records and pill count have also been recommended as potential adherence measurement tools in low resource routine program settings and comparison and combination with these measures will be a focus of future research [19, 4345]. Optimal cut-off values and an appropriate diagnosis and intervention strategy based on the scale result need to be established for routine clinical care. Although this tool appears to be valid and well-understood across diverse populations, the optimal cut-offs and possible second-stage screening and interventions are likely to differ across population groups so further research in other contexts is recommended. It must also be noted that the women in this study were all newly initiated on ART but had persisted on treatment to the time of assessment. How the scale will perform if used repeatedly over time, and generalizability of our findings to treatment experienced cohorts is not known.

In summary, these findings show that a simple three-item self-reported adherence scale could be used to screen for poor adherence and potentially flag current or pending elevated VL in this HIV-infected pregnant and postpartum population on ART. This is the first reported use of this scale outside of the US and it has performed well in this setting after translation. Adherence monitoring during pregnancy and after delivery in low resources settings requires more attention in the era of universal ART for all HIV-infected pregnant and breastfeeding women, and with further validation within routine care, this simple scale may add value to maternal adherence monitoring in low-resource settings.