Introduction

The global public health community agrees that monitoring retention in care is important for programs providing antiretroviral therapy (ART) to people living with human immunodeficiency virus (PLHIV). Retention is a crucial metric of success in HIV treatment: Current antiretroviral medications can suppress the virus but quickly lose effect if halted and viral rebound often carries risk of resistance [1]. Epidemiological studies have consistently shown an association between missed visits and losses to follow-up with increased mortality and elevated viral load [2, 3]. In much of the world, routine viral load monitoring is just beginning to roll out, while visit data is ubiquitous and has served as a widely usable metric of ART treatment success [4,5,6,7]. Furthermore, even where viral load monitoring is routine, monitoring is rarely complete. In these settings, even 20% of the patient population with missing viral loads will generate considerable uncertainty in overall suppression. Finally, the Presidents Emergency Plan for AIDS Relief, many Ministries of Health, the Joint United Nations Programme for HIV/AIDS, and the World Health Organization (WHO) are likely to keep retention in care as a programmatic indicator [8,9,10,11].

Despite widespread recognition of the importance of monitoring retention, at present, different agencies, programs, and monitoring and evaluation procedures calculate retention in different ways, and there has been relatively little systematic assessment of the range or comparison of different metrics [5, 12]. In particular, in low- and middle-income countries, information systems often dictate what kinds of metrics are possible. In a system where visits and appointments are recorded on paper, metrics based on a limited number of data points are necessary. Such metrics, while widely implementable, may not capture all dimensions of a complex construct as we will explore in detail later. Settings with electronic records, which are rapidly rising in low- and middle-income countries, are able to produce more nuanced metrics of retention, but even when these metrics are pooled, should be interpreted with caution, since they are likely not to be representative due to often missing data in required fields and their unrepresentativeness of public health care systems since they are mostly existent for non-governmental organization-funded projects. In addition, retention is not a static condition and may change depending on the time scale in question. How methods address time is an important dimension of how retention figures are interpreted. In other words, 85% retention at 1 year after ART initiation is poor, whereas 85% retention after 5 years might be considered by some to be high. Finally, many small data management as well as practical analysis decisions influence the results, but are not always reported explicitly with the metric [13].

To promote a systematic assessment of technical issues in the calculation of retention, we carried out a synthetic review. We sought to document major ways in which retention is calculated, describe the mechanics of each calculation, and discuss relative advantages and limitations of each retention metric. While assessments of these metrics exist in the literature, this synthesis will help ART programs and the research community evaluate available metrics more easily, tailor programmatic reports to their context, and enhance comparability across settings. In addition, we include code and a simulated dataset to help make the discussion as transparent as possible.

Methods

Conceptualization of Retention

A conceptual understanding of retention in HIV care is necessary to contextualize and evaluate different quantitative descriptions. We conceive of retention as the continuous and appropriate contact with the health system (including differentiated service delivery models) to obtain education, monitoring, preventive, and therapeutic services related to HIV disease. We consider retention to be necessary, but not sufficient, for actual day-to-day adherence to medications. Some authors question the importance of retention, suggesting that HIV ribonucleic acid (RNA) is a more important metric of success in treatment. We conceive of retention and viral suppression as different sides of the same coin—retention in most situations is on the causal pathway to suppression. It should be noted though, that one major limitation of using visits to calculate retention is that routine data generally capture only the frequency but not the nature of visits, thus compromising an important dimension of retention.

Search Strategy and Study Selection

To examine existing methods of quantifying retention, we carried out a synthetic search in PubMed to identify studies that examined retention using the search terms “HIV care, ART, antiretroviral therapy, retention, attrition, Loss to follow-up.” The search was conducted in April 2017. We looked through the articles arising from our search terms that were published in peer-reviewed journals and shortlisted those that gave unique metrics of retention that had not previously been identified [5, 14,15,16]. We also shortlisted policy documents, e.g., the WHO consolidated strategic information guidelines document [17], that outline the different metrics for determining retention in ART care.

Data Extraction

In each paper, we documented how retention in HIV care was calculated for the study. The data was captured and integrated into an Excel spreadsheet. We used existing categories of retention metrics already offered by authors, such as concepts of “visit constancy” and “visit adherence” [5]. Due to the diversity of ways in which retention was calculated, we also sought to categorize each metric using previously defined groups which are visit constancy, visit adherence, survival analysis, and other approaches as outlined in our results section.

Data Analysis

In order to investigate the relative benefits and limitations of each metric, we simulated a dataset of 1000 patients receiving ART and their follow-up visits, based on an African cohort in the published literature. We generated patient age, sex, CD4 cell count at ART initiation, time of ART initiation, appointment dates, and visit dates. To mimic actual appointment scheduling practices, the next appointment date was assigned in the simulated dataset at each visit and only at that visit. The simulated data contained missing values to mimic real world data and illustrate problems with calculating retention with such data. We generated retention metrics for a number of operational definitions of retention extracted from the literature.

Data Availability

Data and code used for this project are included in the supplementary files associated with this article and can be found on Dropbox: https://www.dropbox.com/sh/v9tch39z7zvpw6z/AAAUuLjfuAX01Y-XG5hPWVZCa?dl=0 (see “Retention Measures_Constancy and Adherence.do,” “Retention Measure_Survival Kaplan Meier.do,” “Multi-state.do,” “Group Based Trajectory Model.do,” “dataset1.dta,” “dataset2.dta”). This project was conducted using Stata MP version 15.0 (StataCorp, College Station, Texas).

Results

Ways of Quantifying Retention

Visit Constancy

A number of approaches, often described as constancy, are based on contact with a facility during a specified period of time, often irrespective of whether other appointments at the facility have been missed. Constancy is one of the earliest ways that retention was quantified. Giordano et al.’s seminal article [14] followed individuals for 1 year and classified them according to the number of 4-month intervals they had completed at least one visit from time of ART initiation. The constancy retention metric was associated with subsequent survival in a dose-response manner. This quantification suggests that if a patient had three appointments during one interval of time and made one, they are considered retained in care during that interval. Mugavero et al. used a slightly different approach and assessed whether two consecutive visits were spaced out by more than 189 days during the course of a year [5]. This approach showed that there is high correlation with viral load suppression. In both cases, the constancy retention metric captures whether continuous contact with the health system is occurring over time.

Simplicity of constancy measures makes this approach pragmatic and widely usable. Constancy does not make use of appointments (otherwise the next scheduled visit dates), but rather actual visits, which makes it immediately useable in settings where appointments are not explicitly given as well as settings where data on appointments is less readily available than data on visits. This is because appointments are usually aligned with running out of dispensed ARV medications and therefore a visit date after a scheduled appointment date is a proxy of interruptions in ARV adherence. From an implementation standpoint, the constancy metric has the benefits of uncomplicated validation and programming and decreased computational and analytical demands. At times, ignoring appointments can be a conceptual advantage if appointments are not rationally offered (either too frequent or infrequent), although in most situations, the use of visits only, without appointments, means that useful information could be lost. An example would be a scenario were a patient comes every 90 days for their ARV pill pick-up but perhaps due to them being unstable are recommended to come every 30 days for drug toxicity monitoring. Given this scenario, if they missed some of these drug toxicity monitoring visits but still adhered to their ARV pill pick-up visits, visit constancy will not be affected. In reality, however, in most situations, the frequency of appointments is considered appropriate.

In our simulated analysis dataset, constancy was defined broadly as the number of patient visits per given time interval, i.e., either within 4 months, 6 months, or 1 year of follow-up; adherence was defined broadly as counts or proportions of patient appointments that were kept or missed otherwise termed as “no-show” visits; and time to event was defined broadly as duration of time to an adverse outcome (i.e., death, loss to follow-up, treatment failure, stopping ART) or the time a patient is alive and adherent to ART from the day they were first initiated on ART up to their last kept visit in comparison to total time the patient is eligible for ART care. The mock dataset illustrates several of these observations about constancy. First, identify distinct visits within individuals by using the “_n” function in Stata (http://statadaily.com/2010/09/01/_n_bysort/)https://www.dropbox.com/sh/v9tch39z7zvpw6z/AAAUuLjfuAX01Y-XG5hPWVZCa?dl=0. In our dataset, we end up excluding 242 people whose only visits were their ART start date (approximately 24% of the dataset) leaving 758 who have less than a year of potential follow-up time. In these remaining individuals, 19% made no further visits, 8% made a visit in one more quarter, 19% in two of the three remaining quarters, and 54% in all three of the quarters. We also do not use additional follow-up time. These data illustrate several points about a metric such as constancy. First, it is not optimally efficient because it cannot use data from observations that do not have a potential follow-up period of a full year. Also, by restricting comparisons within the first year of follow-up—those with longer follow-up are not fully used. On the other hand, the calculation is easy to do and can be used as predictor for subsequent behavior.

Another form of visit constancy, which is one of the simplest metrics of retention used by other international stakeholder agencies, is termed “known alive and on ART one year after initiation.” This measure prioritizes quantitative practicality in settings where patient registers and medical records are often kept on paper. “Known alive and on ART one year after initiation” constancy is usually the percentage of adults and children with HIV who are alive and on ART a designated number of months after initiating treatment, e.g., at 6, 12, or 24-month intervals, and is calculated by looking at a group of persons who started ART in one calendar month, and looking at a window of time 1 year later to see if they have visited the facility in that interval. This indicator is both practical and simple to calculate and thus can be used at the health facility level for assessing patient retention. This formulation, however, has some limitations. This metric does not account for any missed visits between enrollment and current alive and in ART care status, namely due to inconsistent therapy which has been linked to sub-optimal ART outcomes. Furthermore, alive and in care at 1 year implies only looking at patients who have had a year of follow-up, and all patients with less than a year are not included, thereby ignoring potentially important secular changes among the cohort of individuals starting treatment recently.

Visit Adherence

A second group of metrics makes use of appointments as well as visits to calculate the fraction of kept appointments, thus bringing an additional dimension into a calculation of retention. Calculating visit adherence requires a number of practical considerations. First consideration is “how late to a visit does a patient need to be for that visit to be considered ‘missed’?” In the USA, an appointment is considered missed even if the patient appears a day late (or even an hour late). In low- and middle-income settings, a period of lateness is generally allowed to transpire before a visit is considered late. In practice, this interval has ranged from 7 to 30 days, but it is obvious the incidence of missed appointments and therefore visit adherence will be sensitive to these decisions.

An appointment-based approach has some advantages over a purely visit-based approach in so far as appointment intervals reflect expected needs for each patient—something that likely differs from individual to individual. If a sicker patient needs more appointments than a healthier patient, conceptually, the sicker patient has more visits to adhere to rather than another patient and they also have a larger “N”; hence, their retention metric that is more robust to missed visits thus may seemingly have better retention than of a healthier patient with fewer appointments who may have missed one of them. Conceptually, the intended visit interval contains information on what a patient needs for appropriate care, and adherence measures incorporate that information. On the other hand, at present, there is an emerging consensus that patients have been asked to come to facilities in the global roll out of treatment, and therefore the inability to make all visits does not uniformly indicate poor engagement, but rather perhaps a poor system. Simple adherence to visits also does not capture an important dimension of missing a visit, which is how long a visit was missed for. A patient who is 1 day late for a visit is better off than one who is 6 months late.

“Time in care” [16] is another measure that falls under adherence and is defined by the proportion of follow-up time participants are considered in care, thus capturing that a visit was made (or not) and the time that a patient is late for a given appointment. This metric is closely related to medication possession ratio, in which pharmacy refill visits are used to calculate the amount of time a patient is in potential possession of medication, thus capturing time that the patient cannot be adherent. For example, if a patient picks up 30 days of medication but does not return to the pharmacy in 60 days, there are at least 30 days in which it is not possible that the patient has medication. The clinical care situation can be conceived of analogously. If appointments are assumed to be appropriate, then being late for an appointment implies the absence of a desired clinical activity. The time in care measurement has an advantage over simple visit adherence in that it accounts not only for an absence, but the length of that absence. Like other metrics, however, time in care will be sensitive to the interval considered “out of care time” especially when patients are lost to follow-up. In other words, when a patient stops returning, if all subsequent potential follow-up time is considered for time in care estimates, this may turn out to be quite low due simply to an undocumented transfer or death. We therefore usually allow 90 to 180 days to count as eligible follow-up time after the last visit before considering a patient as no longer being part of the cohort.

Survival Analyses

Survival analyses in general address shortcomings of some of the techniques described but are accompanied by their own complexities and assumptions. Metrics like constancy and adherence generally reduce a patient’s experience into a single number, which is useful in its simplicity but can sometimes hide important heterogeneity. For example, it could be argued that any measure of visit adherence over say the first 6 months cannot be compared to visit adherence over 3 years of treatment even though the metric does not automatically distinguish these quantities. Survival analysis offers a formal assessment of incidence and therefore of an occurrence of an event (such as failure to be retained) as of a given time (such as 1 year after ART initiation). Because observations are censored when observation time ends, survival analysis makes use of all individuals whether they contribute small or large amounts of observation time. The simplest approach is to use a Kaplan-Meier (KM) estimates where the event of interest is disengagement from care or loss to follow-up.

Alternatives to KM estimates can capture greater nuance in the multi-dimensional retention experience. KM estimates are ideally suited for “absorbing states”—states such as death which can happen but never un-happen. Retention conceptually is a condition that patients can be in, leave, and re-enter, repeatedly. In fact, the entry and exit from care has generated a number of conceptualizations. Some authors have referred to this process as “churn” [18] and others “the side door.” When using a KM approach, the analyst must decide whether to use any absence or absence at the time of database closure as the event. Competing risk models are increasingly used to capture retention outcomes [19, 20], and in these models, a number of outcomes can occur, each of which may influence the probability of another outcome. In the case of retention, stopping care may be related to death in two different ways. First, a death may happen shortly after a visit, and hence, the death precludes the patient from being lost to follow-up. Traditionally in KM analyses in relation to retention in care, patients who die are often censored, at least in part because all subjects either have the outcome or are censored. The idea that deaths are simply “no longer observed” for the outcome of missing subsequent visits is easily seen as not applicable. Therefore, in an estimate of time to disengagement from care, deaths should more appropriately be considered a competing risk event rather than a censored observation.

Even though competing risk analyses is a more nuanced analysis that generates less biased estimates in the face of competing risk, limitations still remain. Unlike death or other absorbing states, retention is concerned with conditions that come and go. For example, a patient can be in care, late, lost to follow-up, and can return to care, perhaps multiple times. Neither KM estimates nor the traditional approach to competing events fully address this problem, since an event can only be assigned to each individual at one timepoint. A multi-state modeling approach allows these conditions to be captured as well as the rates of transition between each of these states to be calculated. These models are now increasingly available such as in the “MSM model” in statistical software R and Stata. Some have used a simplified work around to allow states to change and mimic a multi-state model that is available in all statistical software using a modification on a competing risk approach. In short, this procedure is based on cutting off the dataset arbitrarily at certain time points, assigning each outcome at that time point, and carrying estimates at that time, before repeating it at a later timepoint. By cutting observation time off at a certain time point, patients can receive different outcomes at different times, thus allowing the estimates of the prevalence of a condition to change over time.

Gillis et al. build a multi-state model that multi-state integrates HIV clinical outcomes and frequency of follow-up to characterize transitions through states of care experienced during the course of an individual’s illness while differentiating between care engagement patterns yielding poor clinical outcomes and those where patients may have been clinically well despite infrequent contact with healthcare providers [21••].

A disadvantage of the multi-state approach is the relatively complex data structure required. In our accompanying simulated dataset, we illustrate several key data management steps. First, since the multi-state approach models continuous data, estimates must be obtained on every day; hence, there must be a row of data for each day which is unexpected of a routine ART program dataset. Data about visits in most databases does not record a visit on each day. In addition, the analysis itself requires some understanding of survival analysis, and the output requires additional coding to format and create graphics in Stata and R. We illustrate this with the use of the “putexcel” function in Stata but recognize that these solutions are specific to particular software packages.

Other Approaches

Finally, there is growing interest in the use of trajectory or growth models for describing retention. In this approach, the researcher asks a slightly different question—instead of seeking an average or a single summary over time, trajectory models seek to identify underlying patterns among individuals in a population, and thus capturing a hypothesized underlying “latent” construct that drives a multi-dimensional set of behaviors over time. Especially when paired with qualitative interviews, these methods are poised to reveal insights about the behavioral basis of retention in care. For example, one can imagine that in a clinical population, approximately 20% of patients might miss some scheduled appointments during their first year receiving ART. It could be that these patients with missed appointments are comprised of two groups—one that starts out with high visit adherence that wanes over time and a second with initial low visit adherence but that then increases adherence over time. Revelation of these two groups can provide more information about behavior and suggest tailored interventions, as opposed to a single metric of visit adherence over time, which may hide this notable heterogeneity.

More advanced analytic methods, however, cannot overcome flaws with measurements. Trajectory analysis, for example, may classify loss to follow-up patients as being out of care, but in truth missing information about people who rapidly enroll at a new facility versus people who remain out of care thus representing two important, but unmeasured, trajectories.

There were a number of operational definitions that were cross-cutting or specific to the different retention metrics. For instance, in all metrics classified under adherence, a kept appointment clinic visit was any visit date that was ≤ 7 days before or ≤ 14 days after a scheduled appointment visit. If a visit was > 7 days before or > 14 days after the scheduled appointment visit, this was not considered a kept appointment visit. If a scheduled appointment visit was missing in the simulated dataset, one was generated given a varying range of days (e.g., + 30 days or + 180 days) after a prior clinic visit date. In addition, some metrics did not include in the denominator those patients with only one visit since ART initiation or those who did not meet a defined follow-up period.

Discussion

The metric of retention in care will likely remain widely used in monitoring HIV programs, even as routine HIV RNA measurements are rolled out. However, there are multiple reasons why metrics of retention are a useful adjunct to HIV RNA monitoring and improve our understanding of the success of treatment experience. Routine HIV RNA measurements for clinical monitoring may not be optimal for epidemiological monitoring for several reasons. First, patients who are not retained the most are likely to be viremic and are underrepresented in HIV RNA metrics. Second, even among patients in care, those with admitted or known non-adherence to medications may be selectively unmeasured. The clinical logic is sound: Why would I measure a patient’s viral load if they tell me they are not taking drugs? On a population level, this means that the measurements obtained are unlikely to be representative of the entire clinic population.

Understanding retention in care is increasingly important as treatment cohorts expand under “test and treat” guidelines. As the number of patients receiving ART expands and an emphasis on achieving viral load suppression is now topical given that all patients are eligible for antiretroviral treatment, a unified application of retention metrics may be more useful now than ever in view of increased risks for antiviral resistance.

In addition to expanding cohorts of patients on ART, there are efforts to differentiate care (e.g., visit spacing, community adherence groups) and reduce the daily patient-provider ratio in busy clinics. As these trends continue, appropriately choosing retention metrics, which can lead to utilization and comparison of multiple retention methods, might be necessary, in order to adopt follow-up actions which will surely affect the success of these efforts [22••].

The synthetic review yielded several indicators, or metrics, that illustrate the significant efforts to holistically quantify retention and the various gaps in current practice. Before exploring the strengths and challenges associated with each metric, consideration must be given to the broader categories under which these metrics fall. For example, we believe that constancy is a very useful and widely usable way to measure retention. In addition, however, retention is a complex behavior, and when using appointments and visits to categorize retention, most methods capture only a part of the story. Where possible, multi-state analyses as well as trajectory analyses can illuminate important differences in patient behavior as it pertains to retention in care. Program implementers and researchers would benefit from a transparent use of retention metrics as well as clear conceptual basis for the metric as designed and as is used.