Introduction

Detecting sub optimal adherence to antiretroviral therapy is critical for HIV providers because adherence-improving interventions have the potential to improve viral response, decrease opportunistic infections, prevent the emergence of drug resistant virus, and improve survival. However, detecting sub optimal adherence in clinical encounters can be challenging. Objective adherence measures, including electronic pill bottle monitors, pill counts, and pharmacy refill records, are considered more accurate than self-report, but are impractical in most clinical settings. Although self-report is vulnerable to numerous biases, associations between self-reported adherence and HIV VL have been well demonstrated [1, 2], including among drug users [3, 4]. However, despite robust evidence supporting the use of self-report to measure adherence, and the surfeit of ways to ask patients about their adherence, few studies have examined how different adherence questions compare with one another.

Measuring adherence by self-report presents two particularly vexing challenges. First, medication adherence, like other routinely recommended behaviors (e.g., regular exercise), is frequently over reported [5, 6]. This leads to a “ceiling effect,” in which the majority of patient’s report perfect adherence. One of the most widely touted benefits of self-report compared to objective adherence measures is that it allows providers to counsel patients at the time non-adherence is reported. However, this opportunity may be missed if patients routinely overestimate their adherence. A second challenge is that lack of standardization, and the sheer number of different adherence questions that have been described, limits the ability to interpret findings and compare results across studies [1, 5, 79].

Self-report adherence questions generally include three elements: a question stem that asks respondents to perform a specific response task (e.g., report the number of pills missed or rate their ability to take pills), a precise recall period (e.g., past 30 days), and a set of response options (e.g., discreet percentages or levels of ability). During the past decade, adherence questions have evolved. For example, using a single question to assess overall adherence (e.g., the visual analog scale) [1, 1013] is increasingly favored over composite or multi-item measures (e.g., the Morisky scale [14] or the Adult AIDS Clinical Trials Group instrument [15]). Another trend is that recall periods of 30 days have been shown to produce more accurate adherence estimates than recall periods of 1, 3 or 7 days [11, 12, 16]. In keeping with both these trends, Lu et al. examined several single adherence questions with either Likert-type or numeric responses. One question in particular asked respondents to rate their overall ability to take their medications as prescribed over the past 30 days. The authors found that this qualitative question (hereafter, RATING) was the only one that produced adherence estimates that were comparable to those derived from concurrently used electronic pill bottle monitors [16]. To our knowledge, this was the first report of an adherence question that produced adherence estimates that were not substantially higher than objective measures, and we know of no subsequent studies comparing both qualitative and numeric 30-day single adherence questions to other adherence questions.

To extend this research, we compared five adherence measures in a sample of HIV-infected drug users on methadone maintenance for opioid dependence. Our goals were: (1) to compare the measures by examining response distributions, ceiling effect and concordance, (2) to determine the consistency of participants’ responses across the measures, and (3) to examine correlations with VL.

Methods

Setting, Design, and Population

We conducted a sub-study among participants in a randomized trial of directly observed antiretroviral therapy. The parent trial was a 24-week directly observed therapy intervention followed by a 12-month post-intervention period [Support for Treatment Adherence Research through Directly Observed Therapy (STAR*DOT)] [17]. Recruitment, intervention, and follow-up activities were conducted on-site at one of nine methadone maintenance clinics administered by the Division of Substance Abuse (DoSA) of the Albert Einstein College of Medicine and Montefiore Medical Center in the Bronx, New York. The sub-study (hereafter, the study) consisted of a one time interview administered on site at the clinic. Interviews lasted between 30 and 60 min and were conducted by trained staff.

STAR*DOT participants (1) were HIV infected; (2) had current prescriptions for combination antiretroviral therapy; (3) were enrolled in a methadone maintenance treatment program; (4) were English speaking; (5) received HIV care at their methadone clinic or a closely affiliated site; and (6) were gentoypically sensitive to their prescribed antiretroviral regimen. STAR*DOT participants were eligible for the study if they were actively being followed in the STAR*DOT post-intervention period or had completed the 18-month STAR*DOT study. Active STAR*DOT subjects were recruited by interviewers at scheduled STAR*DOT research visits; subjects who had completed the STAR*DOT trial were contacted by mail or telephone. Study staff determined if participants were acutely intoxicated or otherwise cognitively impaired, and if they were able to participate in the informed consent process. The study was approved by the Committee on Clinical Investigations of the Albert Einstein College of Medicine and by the Institutional Review Board of Montefiore Medical Center. All participants gave written informed consent and were reimbursed $20 at the completion of the study interview.

Study Procedures

Prior to beginning the interview, study staff used the following script: “We understand that most people find it hard to always remember to take their medications. For example, some people forget to take their pills with them when they leave the house or go on a trip, and some people skip taking pills to avoid side effects or just feel like they can’t take pills that day. Remember that this information is confidential and will not be given to your provider, your substance abuse counselor, or anyone else in the clinic.”

Other than the self-administered visual analog scale, all questions were interviewer-administered. Questions and responses were read aloud by the interviewer and participants were shown a laminated card with only the response options displayed in large font.

To avoid an order effect, we used two versions of the survey that presented the questions in different order. We also interspersed adherence questions with other survey instruments to minimize fatigue, and to encourage participants to consider each adherence question independently.

Adherence Questions

The recall period was 30 days for all questions except the CPCRA for which the recall period was 7 days. The five adherence measures are as follows.

Rating (RATING)

Question Stem: Thinking about the past 4 weeks, on average, how would you rate your ability to take all your medications as your doctor prescribed them?

Response Options: Very poor, poor, fair, good, very good, excellent.

Frequency (FREQ)

Question Stem: Thinking about the past 4 weeks, how often did you take all your HIV antiretroviral medications as your doctor prescribed them?

Response Options: None of the time, a little of the time, a good bit of the time, most of the time, all of the time.

Percent (PERCENT)

Question Stem: Thinking about the past 4 weeks, what percent of the time were you able to take all your medications as your doctor prescribed them?

Response Options: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100%.

Visual Analog Scale for Medications Taken (VAS)

Question Stem: Place an “X” on the line at the point showing how much of your HIV antiretroviral medications you have taken in the past 4 weeks. Zero percent means you have taken no antiretroviral medications, 50% means you have taken half your antiretroviral medications, 100% means you have taken every single dose of your antiretroviral medications during the past 4 weeks.

Response Options: Interviewers handed the survey and a pencil to participants who indicated their response by making a mark directly on a horizontal line. The line had hatch marks and numeric labels at 10% intervals and ranged from 0 to 100%.

Community Programs for Clinical Research on AIDS (CPCRA)

The CPCRA consists of a series of questions that were asked separately for each antiretroviral medication.

Question Stem: Thinking about the past 7 days, how many (medication name) pills did you take?

Response Options: All my pills every day, most of my pills, about one-half of my pills, very few of my pills, none of my pills.

Additional Variables

To describe our study population, we extracted sociodemographic characteristics from the baseline visit for the STAR*DOT study, including age, sex, race, ethnicity, housing status, marital status, educational level, and employment status. From clinical records, we extracted HIV VL and urine toxicology data.

We assessed literacy using the reading subtest of the Wide Range Achievement Test-3rd edition (WRAT-3), which is widely used to measure basic academic skills [18]. The reading subtest of the WRAT-3 consists of recognizing and pronouncing isolated words out of context. Average reading scores are defined by mean performance for a particular grade level.

Analyses

Marks on the VAS line between labeled 10% intervals were considered to denote the numerical midway point (i.e., a mark between 80 and 90% was interpreted as 85%). To combine individual CPCRA questions, which are asked for each medication in a participant’s regimen, we converted the responses to numeric values (100, 75, 50, 25, and 0%) and then averaged the rate for each medication in the regimen to compute an overall adherence rate for each participant.

For all adherence questions, we considered the highest response option to represent perfect (100%) adherence. We examined differences in response distribution between the five questions by creating histograms and then used McNemar’s Test to compare the proportion of participants at the ceiling (i.e., highest response level) for different questions. We then dichotomized responses as perfect (100%) versus imperfect (<100%), to assess concordance using kappa statistics of agreement.

We defined individuals as having consistent responses if they endorsed response options at either extreme across all five questions (i.e., the highest or lowest), or if all five of their responses were between the highest and lowest response levels. Participants were considered to have inconsistent responses if they endorsed either the highest or lowest response on at least one, but middle response levels on one or more other questions.

To explore construct validity, we converted responses for each question to a 100-point scale (e.g., questions with five response levels were converted to 0, 25, 50, 75, and 100%) [19]. We then examined the association between each adherence question and HIV VL on a log10 scale using Spearman correlation coefficients. We used Wolfe’s test to compare correlation coefficients [20].

Results

The sample of 53 participants was 49% female, 47% Hispanic, and 40% Black with a mean age of 49 (SD 7) (Table 1). Fifty-five percent had completed 10th grade; mean reading ability grade equivalent was 8.0 (SD 3.9). The median duration of HIV infection was 13 (IQR 9–16) years and all participants were antiretroviral treatment experienced. Participants had been on methadone for a median of 10 years (IQR 5–16).

Table 1 Characteristics of the study sample, n = 53

Response distributions for the five adherence measures differed markedly (Fig. 1a–e). Although responses to all questions were upwardly skewed, the fraction at the ceiling differed: 22% for RATING, 38% for FREQ, 24% for PERCENT, 32% for VAS, and 58% for CPCRA. RATING and FREQ were the only questions for which the highest response level was not the one most commonly endorsed. For example, the most commonly endorsed response to the RATING question was “good” (28%), with slightly fewer participants endorsing “very good” (25%) or “excellent” (22%). In response to the FREQ question, the majority of participants reported “most of the time” (43%), with fewer reporting “all of the time” (38%).

Fig. 1
figure 1

a RATING, b FREQUENCY (FREQ), c PERCENT, d visual analog scale (VAS), e Community Programs for Clinical Research on AIDS questions (CPCRA)

Proportion of response options endorsed for individual questions

RATING: very poor (2%), poor (6%), fair (17%), good (28%), very good (25%), excellent (22%)

FREQUENCY (FREQ) none of the time (2%), a little of the time (8%), a good bit of the time (9%), most of the time (43%), all of the time (38%)

PERCENT 30% (4%), 50% (9%), 60% (6%), 70% (11%), 80% (23%), 90% (23%), 100% (24%)

VAS 0% (2%), 30% (6%), 40% (2%), 50% (4%), 60% (2%), 70% (14%), 80% (18%), 85% (2%), 90% (18%), 100% (32%)

Community Programs for Clinical Research on AIDS (CPCRA) 0% (4%), 25% (5%), 33 (2%), 50% (4%), 75% (21%), 88% (4%), 92% (2%), 100% (58%)

The number of participants at the ceiling was higher for the CPCRA compared to all other measures (P < 0.02 for all comparisons). In addition, the proportion of participants at the ceiling was higher for FREQ compared to RATING (38 vs. 22%; P = 0.01) and for FREQ compared to PERCENT (38 vs. 24%; P = 0.008). The ceiling effect did not differ significantly among the RATING, PERCENT, and VAS.

We dichotomized each of the five measures (perfect versus imperfect adherence) and calculated Kappa statistics (Table 2). Overall, CPCRA had lower agreement with other questions (from 0.34 with RATING to 0.53 with FREQ). Kappa statistics for other measures ranged from 0.56 to 0.74.

Table 2 Kappa statistics of agreement for classifying adherence as perfect (highest response option) versus imperfect (all other response options)a

Responses were consistent for 27 participants (51%), including 9 (17%) who reported perfect adherence across all five measures, and 18 (34%) who endorsed responses between the lowest and highest response levels on all five measures. No participants consistently endorsed the lowest response level. Responses were inconsistent for 26 participants (49%), who endorsed the highest or lowest response level on at least one question, but middle responses on one or more other questions. Among the 26 participants with inconsistent responses, 21 (81%) endorsed the highest response on at least one measure, 5 (19%) reported the lowest response on at least one measure.

Spearman correlations with VL were significant for RATING [r = −0.312 (95% CI −0.54 to −0.04) P = 0.02], FREQ [r = −0.321 (95% CI −0.55 to −0.05) P = 0.02], PERCENT [r = −0.352 (95% CI −0.57 to −0.09) P = 0.01]; and VAS [r = −0.367 (95% CI −0.59 to −0.10); P = 0.009), but not for CPCRA [r = −0.189 (95% CI −0.44 to 0.09) P = 0.18]. The difference in correlation coefficients between VAS and CPCRA approached significance (P = 0.08) but none of the other correlations were significantly different.

Discussion

In this sample of HIV infected drug users on methadone for opioid dependence, the distribution of responses to five concurrently administered adherence measures varied widely. More participants endorsed imperfect adherence using the 30-day single-item measures compared to the 7-day multi-item measure that averaged separate responses for each antiretroviral. The proportion of participants with 100% adherence was as low as 22% for a single item measure and 58% for the multi-item measure. Overall agreement between measures ranged from fair to good and almost half the participants had inconsistent responses across measures. Compared to the multi-item measure, single-item measures had higher correlations with VL.

In our study the ceiling effect was least pronounced for the RATING question, a single-item qualitative question with an evaluative response scale. Several factors may contribute to this finding. First, evaluative responses (e.g., “very good”) may convey normative information, including expectations of adherence behaviors and how respondents compare themselves with peers [2123]. Second, it may be easier for respondents to endorse imperfection when it is qualitative (e.g., “very good”) rather than numeric (e.g., 80%). In other words, 80% may be cognitively or emotionally perceived as less desirable than “very good”, which is a generally positive response. The elicitation of normative information may partly explain the less pronounced ceiling effect we observed with qualitative questions.

Almost half the sample was characterized as having inconsistent responses because they endorsed perfect adherence on at least one question but imperfect adherence on at least one other question. One potential interpretation of this finding is that this group represents a subset of poorly adherent participants who might be misclassified as adherent. Use of questions with less pronounced ceiling effects may lead to adherence discussions that might otherwise not occur.

Despite variation in response distribution, all questions with 7-day recall periods correlated with VL but the correlation was not significant for the 30-day CPCRA. The association between self-reported adherence and VL is consistent with prior research [1, 2] and confirms that most adherence questions reflect the same underlying construct. Our results suggest that the correlation between the VAS and VL may be stronger than the correlation between the CPCRA and VL, and therefore that the VAS may be eliciting pill taking behavior that is more clinically meaningful compared with the CPCRA. However, the correlation between the VL and VAS was not significantly different from the correlations between the other measures and VLs.

Our findings are consistent with prior studies that have found that estimates of self-reported adherence depend on the specific questions asked [1013, 16, 2426]. Differences among questions may be explained by how respondents interpret question tasks and response options. Processes used to interpret adherence questions are best examined using cognitive interviewing, a method by which respondents “think aloud” while they answer survey questions. In addition to understanding how respondents interpret questions differently this methodology may reveal attitudinal factors that influence responses. For example, an honest, compulsively adherent individual who misses the occasional dose may guiltily endorse a low adherence level while the blithe individual who misses many doses may endorse a high adherence level, and the effect of these influences may differ by adherence question. Though difficult to study using standard research methods, understanding these issues merits further research.

Specifically, in-depth interviews may help understand structural differences between similar questions like the PERCENT and VAS. Though both these questions are single-items with numeric responses at 10% intervals, they have different question stems and methods of administration. For example, PERCENT refers to the proportion of time the medications were taken correctly, whereas VAS refers to the amount of medications taken correctly. In addition, to administer the PERCENT question, the interviewer read the question and recorded the verbal response. To administer the VAS, the participant directly marked a 0–100 horizontal line with hatch marks at 10% intervals. These are important considerations because the two questions performed differently: the response distributions varied and the correlation between responses was 0.63. Though multiple differences make it challenging to examine the differences between questions, these issues highlight the complexity of self-reported adherence.

Our results should be seen in the context of important limitations. First, we evaluated adherence questions in a unique population, including some active drug users. Though we did not specifically measure health literacy or numeracy, the sample had an 8th grade reading level, demonstrating basic literacy. In addition, the median duration of HIV infection was more than 10 years, the majority of participants was on antiretroviral therapy for more than 1 year, and over half had an undetectable HIV VL. We therefore believe that this study population was composed of relatively stable patients in long-term drug treatment. Second, because participants were enrolled in a longitudinal adherence study, awareness of adherence questions was likely to be heightened. However, none of the questions we included in our survey were used in the longitudinal trial. Third, our sample may have been too small to detect differences in correlations with VL among adherence questions and correlations between adherence and VL may have been biased by heterogeneity regarding the duration of antiretroviral therapy. Fourth, we assumed that adherence was relatively steady during the 1-month period between assessments of adherence and VL. Lastly, we were not able to compare adherence questions to an objective adherence measure and cannot make conclusions about the relative accuracy of these five measures.

In sum, adherence data collected using self-report is highly dependent on the structure of the adherence question. Since HIV best practices include providing adherence counseling to all patients on antiretroviral therapy [27], questions that are feasible in clinical settings and that minimize the ceiling effect, have the potential to improve clinical outcomes for patients with HIV. Our findings generate hypotheses for future research on the use of single-item questions with qualitative response options in clinical settings, including an increase in counseling opportunities. Using these measures may allow providers to address adherence before facing the patient with pansensitive virus and a rising VL who routinely reports being perfectly adherent.