Introduction

With 33.2 million people worldwide currently infected, and with 2.1 million deaths in 2007 alone, the HIV pandemic is one of the most significant public health challenges of the 21st century (UNAIDS 2007). In most countries, the HIV epidemic is driven by sub-populations at highest risk (termed “most-at-risk populations”) for becoming infected with or transmitting HIV (UNAIDS/WHO Working Group on Global HIV/AIDS and STI Surveillance 2000). In two types of epidemic, low-level (prevalence of infection is <5% in most-at-risk populations) and concentrated (prevalence is >5% in most-at-risk populations but is not yet >1% in the general population) these most-at-risk populations include injecting drug users (IDUs), men who have sex with men (MSM), and sex workers (SWs) along with their sexual partners, as well as displaced populations, migrant workers, long-distance truck drivers, and youth (Mills et al. 2004). Even in generalized epidemics, in which prevalence is >1% in pregnant women attending antenatal clinics, there is recognition that risk is not uniformly distributed within populations and is driven, at least initially, by most-at-risk populations that bridge HIV to the lower-risk general population (Chopra et al. 2007; Doherty et al. 2006; Gregson et al. 2002; Halperin and Epstein 2004).

Accurate HIV data on incidence and prevalence and associated behavioral data from most-at-risk populations are essential for designing targeted prevention programs to reduce the further spread of the epidemic (Mills et al. 2004; Pisani et al. 2003; Zaba et al. 2006). In most countries, however, HIV surveillance systems, the primary source of epidemiologic data, do not generate representative samples of most-at-risk populations. Although probability-based sampling methods are the gold standard for collecting unbiased and generalizable biological and behavioral data on HIV, their application is limited when sampling most-at-risk populations such as IDUs, MSM and SWs, especially the hidden subsets of these groups. The methods are limited, first of all, because these populations generally do not have sampling frames from which to draw random samples using conventional probability-based sampling methods. In addition, the groups are too small to be captured in large enough numbers in surveys of the general population. Second, individuals within these populations often practice socially stigmatized or illegal behaviors, resulting in difficulties accessing them. As a result, they are often recruited through institutions (e.g., hospitals, jails, drug-treatment clinics) using convenience techniques, such as quota and snowball sampling, or they are recruited through visible venues (e.g., bars, clubs, street corners, shooting galleries) using targeted sampling (Magnani et al. 2005; Semaan et al. 2002; Watters and Biernacki 1989).

Respondent-driven sampling (RDS) is a relatively new sampling method that has been recognized and adopted by public health researchers as a promising alternative means to sample most-at-risk populations for biological and behavioral HIV surveys. RDS is a chain-referral sampling technique that uses statistical adjustments for network size to produce generalizable samples (Abdul-Quader et al. 2006a; Heckathorn 1997, 2002; Magnani et al. 2005; Salganik and Heckathorn 2004; Semaan et al. 2002). The RDS recruitment process begins with a set number of individuals, or “seeds,” selected purposefully from the target population. Seeds are trained to recruit a set number of individuals (“recruitment quota”) from their social network of peers. The recruits of the seeds who enroll in an RDS study are also trained to recruit a set number of individuals from their social network of peers. Both seeds and recruited participants typically receive incentives, both to be interviewed (referred to as “primary incentives”) and to refer additional recruits (“secondary incentives”). Ideally, this recruitment process continues to produce long recruitment “chains” made up of several “waves” of recruits. As the recruitment chains lengthen, the composition of the sample begins to reach a point of “equilibrium” whereby the composition of certain characteristics (e.g., age group, gender, ethnicity, HIV prevalence) within the sample eventually stabilizes, indicating that the final sample is not biased by the purposeful selection of seeds (Heckathorn 2002). It is generally understood that RDS can be applied only in populations that are socially networked and in which members of the networks are willing to recruit from among their peers.

In addition to the recruitment process, RDS involves a complex analytical component that is crucial to generate representative estimates and confidence intervals. It is done through adjustments that factor in the sizes of participants’ social networks and the sample’s different recruitment patterns. In this paper, we refer to RDS as both the recruitment and analysis components.

Respondent-driven sampling was first used in 1994 to study HIV-related risk behaviors among IDUs in the eastern United States (Heckathorn 1997). Outside the United States, RDS was not used for HIV surveillance until 2003 (Wattana et al. 2007), but since then it has been employed widely by international researchers to gather biological and behavioral data on HIV. To date, there are ongoing discussions about the effectiveness of RDS in different socio-cultural settings and among certain most-at-risk populations (Abdul-Quader et al. 2006b). Additionally, misunderstandings exist about RDS methodology, the importance of statistical adjustment, and the method’s requirements (Heimer 2005; Johnston et al. 2008; Ramirez-Valles et al. 2005; Salganik 2006; Simic et al. 2006). To address these issues, we reviewed biological and/or behavioral HIV surveillance surveys that used RDS and that were conducted outside of the United States to sample HIV most-at-risk populations. In this papar, we summarize operational and analytical characteristics of RDS studies and discuss factors that may affect recruitment. Implementation and theoretical challenges to RDS studies are discussed in a companion paper (Johnston et al. 2008).

Methods

Literature Search

We searched published and unpublished manuscripts, abstracts, reports, protocols and notes from field supervisors related to HIV biological and/or behavioral surveillance (accessible from 2003 through October 1, 2007) that involved RDS in countries other than the United States. We excluded studies based in the United States because those conducted prior to 2006 have already been or are currently being reviewed (Abdul-Quader et al. 2006b; Robinson et al. 2006). We conducted initial searches using MEDLINE (1970–2007), PubMed and Google Scholar (up to the first 50 pages). This search included an iterative process to refine the search strategy by testing several search terms and incorporating new search terms as new relevant citations were identified. Multiple combinations of keywords and phrases were used to assess study eligibility, including: (1) methodology: “chain-referral sampling” or “respondent-driven sampling”; (2) population of interest: “men who have sex with men”, “bisexual”, “sex workers” (male, female, transgender) and their partners, “drug users” (injectors or non-injectors), “homeless”, “run-away youth” or “migrant population”; (3) medical domains: “HIV”, “HCV”, “sexually transmitted infections”, “drug abuse”, “overdose” and “needle sharing”; (4) language: documents in English, Spanish, French, Portuguese, Farsi, or Arabic; (5) location: studies conducted in a country or countries other than the United States. We further conducted a “cited reference search” in Web of Science on the relevant papers and used the “related articles feature” in PubMed.

The majority of data was provided by co-authors and their collaborators directly involved in conducting RDS surveys and through contacts with organizations involved in specific RDS surveys, including Tulane University School of Public Health and Tropical Medicine; University of California, San Francisco Global Health Sciences; the Global AIDS Program, Centers for Disease Control and Prevention; Family Health International; the Federal University of Ceará, Brazil; and national ministries of health.

Eligibility Criteria

We assessed articles identified through our original search and differentiated studies that used the RDS recruitment process and analytical elements from those that did not. First we included studies in our review that (1) initiated recruitment chains with members of the target population, known as seeds; (2) used a recruitment quota; (3) collected data on the size of social network for all participants using a consistent set of parameters; and (4) systematically recorded who recruited whom. To ascertain whether a study was conducted among a population that was socially networked, we included only studies that either reported that one or more seeds could generate a minimum of three referral waves or, in case waves were unreported, that the study attained a minimum of 10% of its desired sample size. Second, we also excluded studies that (1) did not generate weighted estimates of variable frequency and confidence intervals using data on network size or, in the case of studies with only recently completed data collection, did not intend to use weighting in their analysis; (2) combined an RDS sample with other samples generated using other methods; or (3) combined samples from multiple RDS studies with different eligibility criteria or conducted in distinct geographical areas. We considered studies that fulfilled all inclusion criteria as complete RDS studies and included them in our review.

Categorizing Documents and Abstraction

We created a master table in Microsoft Excel, extracted key information from included surveys, and entered data into the table. Once we completed data entry, we divided studies into four sub-tables based on the population of interest: IDU, MSM, SW and high-risk heterosexual (HRH) men. We abstracted (1) the principal investigator or contact person or organization; (2) the year of the study; (3) where the survey was conducted; (4) eligibility criteria; (5) types of biological specimen(s) gathered and laboratory tests performed; (6) whether formative research was conducted prior to the survey; (7) interview method; (8) number of recruitment sites; (9) type of recruitment site; (10) whether mobile recruitment sites were used; (11) whether seeds were diversified, meaning they were selected differently from each other based on key demographic or risk behavior characteristics; (12) total number of seeds used for the study; (13) number of seeds that failed to recruit anyone; (14) whether additional seeds were added after the study began; (15) the maximum number of allowable referrals; (16) whether an expiration period was used, meaning the total number of days between when a participant completes the survey and his or her recruited peer enrolls in the survey; (17) the primary incentive amount, which is the amount given for completing the survey in US dollars calculated on October 15, 2007; (18) the secondary incentive amount, which is the amount given for each participant-referred recruit who enrolls in the survey; (19) other services offered during the survey; (20) design effect used to calculate a sample size; (21) desired sample size; (22) actual sample size; (23) maximum number of waves; (24) sampling duration in weeks; (25) whether equilibrium was reported as being reached; (26) whether survey data were adjusted using respondent-driven sampling analysis tool (RDSAT) (Volz et al. 2007) or a similar software program; and (27) description of any operational limitations.

We assessed the success of each study by the proportion of the pre-designated sample size that was actually recruited and whether the authors reported reaching equilibrium. We compared the number of successful recruits per seed per week using the Mann–Whitney U test with significance at P = 0.05 in STATA version 10 (StataCorp, College Station, Texas).

Results

We identified 155 biological and/or behavioral HIV surveys that were conducted among most-at-risk populations outside of the United States. Of these, 32 (21%) studies did not fulfill our RDS criteria and were excluded. Nineteen (59%) of these studies combined RDS samples with samples collected using other sampling techniques; five (16%) failed to generate a minimum of three referral waves, four (13%) either did not report whether they had collected data on size of the social networks or reported them inconsistently; two (6%) did not analyze their data using proper RDS techniques; one (3%) did not provide sufficient information about RDS recruitment requirements: and one (3%) combined samples from two different RDS studies.

One hundred twenty-three studies met all of our eligibility criteria. Of these, one study was completed in 2003, nine studies in 2004, 34 in 2005, 65 in 2006, and 14 in 2007. Studies were conducted in 28 different countries and five continents: Europe (59, 48%), Asia (40, 33%), Latin America (14, 11%), Africa (7, 6%) and Oceania (3, 2%) (Table 1). Sixty-five studies (52%) were among IDUs (Table 2), 39 (32%) among MSM (Table 3), 18 (15%) among SW (Table 4), and one (1%) among HRH men (Table 5). Between 2003 and October 2007, a total of 32,298 participants were surveyed, of whom 17,434 (54.0%) were IDUs, 10,101 (31.0%) were MSM, 4,342 (13.5%) were SWs, and 421 (1.5%) were HRH men.

Table 1 HIV biological and behavioral studies that used RDS by risk group and continent, 2003–2007a
Table 2 HIV biological and behavioral studies that used RDS, injecting drug users, 2003–2007a
Table 3 HIV biological and behavioral studies that used RDS, men who have sex with men, 2004–2007a
Table 4 HIV biological and behavioral studies that used RDS, sex workers, 2004–2007a
Table 5 HIV biological and behavioral studies that used RDS, high-risk heterosexual men, 2006a

One hundred six studies (86%) reported collecting both HIV biological and behavioral data concurrently, and the remaining 17 (14%) were solely behavioral surveys. Sixty-four (53%) collected dried blood spots, 44 (36%) venous blood, 6 (5%) oral fluid and 25 (21%) urine or penile or vaginal swabs. Of the 112 studies with available information, 101 (90%) reported conducting some degree of a priori formative research. Although face-to-face methods were the most common means of interviewing (110 studies, 89%), audio computer-assisted structured interviews (ACASI) and self-administered instruments were used in eight (7%) and five (3%) studies, respectively. Participants were enrolled at a variety of sites including governmental hospitals, public health clinics, public health departments, non-governmental organizations providing services for target groups, voluntary counseling and testing clinics, hotel rooms, rented store fronts, and mobile vans. Of the 114 studies that reported the number of recruitment sites, 92 (81%) used a single site, but as many as five sites were used. Only six (5%) studies reported using mobile vans as recruitment sites; and in one study, two vans were used but in stable locations.

One hundred twenty (99%) studies reported that seeds were diversified (i.e., selected differently from each other) based on key demographic or risk behavior characteristics; three studies did not report on diversification. Thirty-one (43%) of 72 studies with available data reported adding seeds beyond the original seeds. All but three studies set the allowable number of recruits per participant at three. Of 103 studies with available data, 59 (57%) did not limit the time during which participants were allowed to refer their recruits. Among 44 studies that did limit time for recruits to respond, the recruitment period ranged from 7 to 60 days.

Studies used a wide range of primary and secondary incentives for recruitment. Of the 107 studies that reported using primary incentives, a majority of 89 (83%) used cash incentives, 11 (10%) gave cash equivalents (e.g., food stamps) or small goods with minimal monetary value and 3 (3%) gave condoms and lubricants; 4 (4%) did not offer any primary incentive. Seventy-eight studies reported data on secondary incentives, and 72 (92%) offered them; these incentives were usually monetary (58 studies, 74%). Seventy-eight studies reported data on both primary and secondary incentives; the value of the primary incentive was higher than that of the secondary incentive in 52 (67%) studies, the same in 14 (18%), lower in seven (13%) and undetermined in four (5%) studies. One (1%) study did not offer any kind of incentive. Of these 78 studies, 55 (71%) gave money as both primary and secondary incentives, 8 (11%) provided money only for one of them, and 15 (19%) did not offer monetary incentives at all. In addition to incentives, studies offered a wide range of additional services, such as free HIV testing and counseling, referral for clinical follow-up, condoms, lubricants and information and educational materials.

We also summarize how successfully studies were able to recruit participants (Table 6). On average, RDS studies used 10 seeds (range, 2–32, median 8.0, intra-quartile range [IQR] 6.0–13.0) and had 1.6 (range 0–19, median 0, IQR 0–2.0) unsuccessful seeds per study. Of 86 studies with available data, 51 (59%) reported having no unsuccessful seeds. The median proportion of unsuccessful seeds per study was lower among studies of IDUs (0%, IQR, 0–5%) than among SWs (20%, IQR 14–30%, z score −3.872, P < 0.0005). There was no significant difference in the median proportion of unsuccessful seeds per study between MSM and IDUs (z score −0.915) or MSM and SWs (z score −1.916). The greatest number of referral waves was among IDUs (34); the average number for all studies was 9.2 waves (median 8.0, IQR 6.0–11.0, range 3–34).

Table 6 Operational characteristics of HIV biological and behavioral studies that used RDS by study group, 2003–2007a

The length of time for recruitment of subjects ranged from 2 to 56 weeks, with an average of 9.2 weeks (median 8.0 weeks, IQR 4.0–10.0). On average, studies recruited 41.0 (median 35.0, IQR 25.0–50.0) subjects per week or 6.4 subjects per seed per week (median 5.2, IQR 2.0–9.1). The recruitment process was relatively more productive in studies of IDUs (median of 7.5 recruits per seed per week), and slower in studies of MSM and SWs (3.6, z score 2.837, P < 0.005) and (3.5, z score 2.727, P < 0.01), respectively (Table 6). There was no significant difference in median recruits per seed per week between MSM and SW studies (z score 0.199).

In 91 studies with available data, design effects varied from 1.0 to 2.5; only 34 (38%) used a design effect of ≥1.5 when calculating sample sizes. One hundred eighteen (83%) studies reported their calculated sample size; the average was 280 and ranged from 100 to 800. One hundred eighteen (83%) studies also reported their final sample size, which ranged from 59 to 963 and averaged 273 (median 247.0, IQR 197–377.0). One hundred thirteen studies reported both calculated (desired) and final (recruited) sample sizes. Studies on average reached 98% of their intended sample size; studies among IDUs reached a greater proportion of their intended sample size (100.0%) than studies of SWs (97.0%) and MSM (94.0%). Thirteen studies (12%) failed to attain at least 90% of their intended sample size; 6 (46%) of these were MSM studies, 4 (31%) SW studies and three (20%) IDU studies. Eleven (85%) of these studies, nonetheless, reached equilibrium on at least one key variable of interest despite shortfalls in recruitment.

Of the 105 studies that reported whether or not they had reached equilibrium, 99 (94%) reached equilibrium and six (6%) had not. These six studies included four studies of IDUs and two studies of SWs. All four IDU studies attained intended sample sizes despite these two SW studies. Of the 18 studies that did not report whether or not they had reached equilibrium, 16 (89%) attained at least 90% of their intended sample size. All but two studies that completed data collection and attained their sample size used RDSAT software to adjust data for different social network sizes and recruitment patterns, and the other two used other methods of adjustment.

Discussion

We were able to identify 123 HIV biological and/or behavioral surveys that used RDS and were conducted outside the United States. The studies were conducted in 28 countries in five continents and had an average sample size of 273 participants. Over 32,000 IDUs, MSM, SWs and HRH men were surveyed in these 123 studies. We are also aware of at least 18 additional HIV biological and/or behavioral studies that, as of October 1, 2007, are being conducted around the world.

We found substantial methodological heterogeneity among the studies. The majority of the studies used formative research, face-to-face interview formats, three referrals per participant, a single interview site for data collection and biological specimens collected from participants, mostly for HIV but also for other sexually and parenterally transmitted infections. Types of sites, number of seeds and types of incentives varied. During data collection, some studies added seeds if recruitment slowed or seeds failed to recruit any peers. The use of some incentive was relatively constant across the studies, consistent with standard RDS methods.

Notably, we found that RDS has been somewhat more successful in IDU studies in terms of recruitment efficiency, as measured by the number of new participants referred per seed for week. Our findings show that RDS studies took 9 weeks to complete on average, but in some cases studies took as long as several months. This variation, however, can be explained; these studies had many differences (e.g., sample sizes, number of seeds, target populations) that would most likely affect the length of the study. Furthermore, investigators can manipulate the process to accelerate or slow recruitment for operational reasons (Johnston et al. 2007).

Our review is subject to several limitations. Like any systematic review, ours is limited by how complete our search was and how complete the reports were once we identified them. Although there may be a few studies that we were unable to find, we believe our search was comprehensive and complete to the greatest extent possible. We also found that many reports were missing key data, which introduced uncertainty into our calculations of average number of initial and added seeds, size of a design effect (if used), desired and recruited sample sizes, number of recruitment waves and sampling duration. Finally, our findings can be affected by multiple RDS studies using the same RDS protocol but conducted in different cities within one country, such as occurred in Ukraine and India (Appendix: references 6, 8).

To assess whether any of these studies generated representative data, detailed information about the implementation and analytical characteristics of each RDS study is needed. In the future it may prove useful to establish certain key data that should be reported for each RDS study, as has been done for randomized controlled trials (Moher et al. 2001) and observational (Von Elm et al. 2007) and qualitative studies (Tong et al. 2007). In general, we suggest that RDS studies should report the following items: (1) whether formative research was conducted, the quality and quantity of such research, and whether the population under study was found to be socially networked; (2) comprehensive description of eligibility criteria; (3) how initial and replacement seeds were selected and how they were found; (4) the maximum number of allowable referrals per participants; (5) whether the recruiter–recruit relationship was tracked; (6) whether a design effect was used during calculation of sample size and the size of the design effect; (7) the sample size calculated versus the sample size attained; (8) the maximum number of recruitment waves attained; (9) length of time needed for data collection; (10) whether equilibrium was reached and for which variables; (11) how the sizes of participants’ social networks were measured; and (12) whether survey data were adjusted using RDSAT or a similar software program.

Our review shows that RDS has been used widely for HIV prevalence and risk behavior surveillance in most-at-risk populations. When designed and conducted correctly, RDS is a valuable method for monitoring trends, better understanding epidemic dynamics and evaluating the effect of public health programs.