Introduction

Populations with higher risk for HIV infection, including men who have sex with men (MSM), people who inject drugs (PWID), and female sex workers (FSWs), are difficult to be sampled to represent the underlying population, often because of stigma or legal issues [1,2,3]. However, accurate information about these hard-to-reach populations (i.e., subpopulations that are difficult to reach or involve in research or public health programs due to their physical and geographical location or their social, legal and economic situations [3]) is necessary for HIV burden estimation, development of HIV prevention interventions, and relevant policy making.

Respondent-driven sampling (RDS) was proposed in 1997 as a new sampling method to overcome limitations of venue-based sampling methods and biases of chain-referral samples, specifically non-randomness of initial samples and non-representativeness [4, 5]. RDS starts with a small number of participants conveniently identified by researchers as seeds and these seeds are then given coupons to recruit other peers within their social networks from the same underlying population [6]. Information of the sample’s successive referral chain and number of contacts in the priority population are often collected for RDS estimators to weight samples to reduce biases [5, 7]. Mathematical theory and simulation studies purport that statistically unbiased estimates of population characteristics can be calculated in an RDS-recruited study population with an appropriate RDS estimator [4, 5, 8,9,10].

However, unbiased estimates stemming from RDS studies require many important assumptions, which cannot always be guaranteed in real world settings [7, 11,12,13,14]. For instance, widely used RDS estimators like RDS-I (i.e., Salganik-Heckathorn estimator or S-H estimator) [9] and RDS-II (i.e., Volz-Heckathorn estimator or V-H estimator) [10] require the with-replacement sampling assumption [12]. RDS-I, RDS-II, and later developed estimators, such as Gile’s SS (i.e., Gile’s Successive Sampling estimator) [8] and HCG (i.e., Homophily Configuration Graph estimator), [15] need at least four common assumptions. First, self-reported network size (i.e., degree, which is the number of connections an individual has to other peers in a network) is accurate. Second, the probability of an individual’s enrollment is proportional to the network size. Third, the recruitment is random. Fourth, recruitment happens over the recruiter’s direct network ties [13]. Given these assumptions are easy to be violated, assessing RDS by using empirical data in real world settings is necessary [16].

One means of assessment is to evaluate the reliability of naïve RDS samples by using repeatedly collected surveys among the same population in the same location, [17] while another way is to evaluate the consistency of population parameter estimates based on different RDS estimators by using repeatedly collected surveys among the same population in the same location [16]. This paper compares population parameters estimated by four RDS estimators (RDS-I, RDS-II, Gile’s SS, and HCG) with five RDS samples from the National HIV Behavioral Surveillance (NHBS) – PWID Cycles in the Greater Newark Area of New Jersey (Essex, Hudson, Union, Sussex, and Morris Counties). The objective of our analysis is to assess the consistency of population parameter estimates based on four RDS estimators, specifically to assess the consistency of population parameter estimates based on different RDS estimators with the same sample and the consistency of time-insensitive demographic estimates (e.g., race/ethnicity, gender, and sexual orientation) based on the same RDS estimator with repeatedly collected samples.

Methods

Study Population and Data Collection

Eligible PWID in Newark, New Jersey were recruited via RDS for the National HIV Behavioral Surveillance (NHBS) survey in five rounds (the year of 2005–2006 as Round 1, the year of 2009 as Round 2, the year of 2012 as Round 3, the year of 2015 as Round 4, and the year of 2018 as Round 5) [18,19,20,21,22]. The methodology for recruiting and surveying PWID in NHBS-PWID Cycles has been published in detail elsewhere [23]. Briefly, initial “seeds” were chosen by referrals from people who know the local population well or through outreach in areas where PWID can be found and completed the NHBS survey. Eligible seeds who completed the initial survey were asked to recruit 3–5 additional PWID they knew. These subsequent recruits who were eligible and completed the survey were asked to recruit others, until the sample size was reached [18,19,20,21,22]. Participants were eligible if they were 18 years old or more, residents of Essex, Hudson, Union, Sussex or Morris counties in New Jersey; reported injecting drugs in the past 12 months and had physical evidence of recent injection or be able to adequately describe their injection practices; were able to complete the survey in English or Spanish; and were able to provide consent [23].

Measures

Protocols were similar for all five rounds of NHBS-PWID surveys included in this study [17]. Eligible participants’ demographic characteristics (age, race/ethnicity, gender, sexual orientation, education, employment status, and marital status) and HIV status were included in this analysis. Age was categorized as 18–29, 30–39, 40–49, 50–59, and 60+. Gender was categorized as female, male, and nonbinary. Sexual orientation was categorized as straight, gay/lesbian, and bisexual. Education was categorized as grades 8 or less, grades 9 to 10, grades 12/GED, some college/associate/technical, and bachelor or more. Employment status was categorized as unemployed, full-time, part-time, homemaker, student, retired, disabled to work, and other. Marital status was categorized as never married, married, living together as married, separated, divorced, and widowed. In the latest four rounds (Round 2 to Round 5), individuals who tested positive by a laboratory confirmed HIV test or who reported to be positive with valid information of seeing a doctor for HIV treatment were considered HIV positive; individuals who tested negative by a rapid HIV antibody test were considered HIV negative; individuals who had an unknown result or who had never been tested for HIV were considered as HIV unknown. Self-reported HIV status was used as HIV status in Round 1 since an HIV test was not provided.

Analysis

Data were analyzed in R 4.2.1 GUI (The R Foundation for Statistical Computing, Austria). Specifically, R package RDS (Version 0.9-3) was used to estimate population parameters of interest with 95% confidence intervals, to compute population homophily statistics and differential recruitment statistics, and to create convergence plots for RDS diagnosis. R package ggplot2 (Version 3.4.0) was used to visualize estimates of population parameters of interest with 95% confidence intervals. First, population parameters of interest in five rounds were estimated by four RDS estimators: RDS-I estimator, [9] RDS-II estimator, [10] Gile’s SS estimator, [8] and HCG estimator [15]. 95% confidence intervals of population parameter estimates were based on bootstrap procedure, with the type of uncertainty as RDS-I, RDS-II, Gile’s SS, and HCG, respectively [7]. These population parameter estimates were compared cross-sectionally. Next, the consistency of time-insensitive demographic estimates (race/ethnicity, gender, and sexual orientation) were compared longitudinally. The overlap of confidence intervals was used as the criterion of no significant difference under alpha level set at 0.05.

Population size parameters used in Gile’s SS and HCG estimating functions were estimated as 10,445 (Round 1), 15,927 (Round 2), 20,223 (Round 3), 23,371 (Round 4), and 30,005 (Round 5). Details can be found in supplementary documents. Other population size parameters (N = 5000, 10000, 20000, 30000, and 40000 respectively) were also used in Gile’s SS and HCG estimating functions for sensitivity analyses of the robustness of Gile’s SS and HCG estimates by using main population size parameters. Network size (i.e., degree) was used in RDS estimators, population homophily statistics, and differential recruitment statistics. It was defined as the number of people who inject drugs and whom an individual has seen in the past 30 days. Its value was adjusted as the number of recruits plus one (recruiter) if it was smaller than the number of recruits plus one (recruiter) [7, 24]. Missing values of network size were imputed as RDS-II weighted medians within each level of race/ethnicity [7, 24]. Population homophily statistics (homophily is defined as the principle that a contact between similar people occurs at a higher rate than among dissimilar people [25]) were estimated as the ratios of the expected number of discordant couples (e.g., female with male as a discordant couple) absent homophily to the expected number of discordant couples with the homophily, with the type of weight as RDS-I, RDS-II, Gile’s SS, and HCG, respectively [7]. Differential recruitment statistics were calculated as the ratios of the average degree of one population group divided by the average degree of those in another population group, with the type of weight as RDS-I, RDS-II, and Gile’s SS, respectively [7]. Convergence plots were used to determine whether the final RDS estimate was biased by the initial convenience sample of seeds [12, 26].

Results

Recruitment and Sample Demographics in NHBS-PWID Cycles

Chain-length characteristics and demographic characteristics of five rounds of NHBS-PWID surveys in the Newark NJ area were shown in Supplementary Tables 1 and Supplementary Table 2. In summary, similar sample sizes of eligible PWID were recruited in each round, with N as 412, 425, 457, 394, and 524, respectively. Even though the round of 2005–2006 had 12 participants who were not designated by NHBS staff but presented as seeds because of unreported coupon IDs, which then underestimated the chain length of the round of 2005–2006, five rounds had relatively enough length of chain (Median of chain-length: 4, 6, 9, 7, and 8, respectively) for RDS to reach equilibrium (a state in which more waves do not provide more unique information to change the existed sample compositions). Similar distributions of gender, sexual orientation, and education, but different distributions of age, race/ethnicity, employment, and marital status, were observed in five rounds.

Comparison of Time-insensitive Population Parameter Estimates

Race/ethnicity, gender, and sexual orientation are assumed to be time-insensitive, which means if samples are sampled from the same population by using RDS, they should have similar estimate of race/ethnicity, gender, and sexual orientation by using the same RDS estimator [17].

Parameters of gender (female, male, and nonbinary) and sexual orientation (straight, gay/lesbian, and bisexual) estimated by RDS-II and Gile’s SS were significantly consistent, both longitudinally from Round 1 to Round 5 and cross-sectionally between two estimators (See Table 1, Supplementary Fig. 2.B-C, and Supplementary Fig. 3.B-C). Specifically, ranges of the proportion of female PWID and the proportion of straight PWID in five rounds of NHBS-PWID Cycles were 0.3262–0.4048 and 0.8241–0.9012 respectively if estimated by RDS-II, and 0.3262–0.4043 and 0.8240–0.9007 respectively if estimated by Gile’s SS, with 95% confidence intervals overlapping with each other (See Fig. 1.B-C).

Table 1 Estimated proportion of time-insensitive demographics using multiple RDS estimators, Newark, NJ, 2005–2018
Fig. 1
figure 1

Estimated proportion of time insensitive characteristics using multiple RDS estimators, Newark, NJ 2005–2018

Unlike RDS-II and Gile’s SS, consistencies of gender parameters and sexual orientation parameters were not fully observed in RDS-I and HCG (See Table 1). Parameters of gender estimated by RDS-I and parameters of sexual orientation estimated by HCG were longitudinally inconsistent (See Supplementary Fig. 2 and Supplementary Fig. 3). For instance, proportions of female PWID in Round 2 (P = 0.1272; 95% CI: 0.0987, 0.1556) and Round 3 (P = 0.0956; 95% CI: 0.0741, 0.1171) were significantly different from those in Round 4 (P = 0.2172; 95% CI: 0.1716, 0.2627) and Round 5 (P = 0.2313; 95% CI: 0.1906, 0.2719), if estimated by RDS-I (See Fig. 1.B), while the proportion of straight PWID in Round 2 (P = 0.1748; 95% CI: 0.0987, 0.2510) was significantly different from proportions of straight PWID in other four rounds (Range of lower bound of 95% CI: 0.7460–0.8568), if estimated by HCG (See Fig. 1.C).

Racial disparities were observed in parameters estimated by four RDS estimators (See Table 1). Specifically, four estimators did not present longitudinally consistent estimates of race/ethnicity (See Supplementary Fig. 1). For example, proportions of Black PWID estimated by RDS-I, RDS-II, Gile’s SS, and HCG in Round 1 were 0.6450 (95% CI: 0.5542, 0.7357), 0.7431 (95% CI: 0.6575, 0.8288), 0.7431 (95% CI: 0.6584, 0.8279), and 0.7995 (95% CI: 0.7313, 0.8676) respectively, which were significantly different from those proportions in Round 3 (Upper bound of 95% CI: 0.0798, 0.4549, 0.4547, and 0.4829 respectively) and Round 4 (Upper bound of 95% CI: 0.1517, 0.4811, 0.4806, and 0.5483 respectively; See Fig. 1.A). However, RDS-II (Range of proportion of Black PWID: 0.3636–0.7431) and Gile’s SS (Range of proportion of Black PWID: 0.3632–0.7431) reduced differences of proportions of race/ethnicity in five rounds (Range of sample proportion of Black PWID: 0.2670–0.7427).

Population homophily statistics and differential recruitment statistics were presented in Table 2. Population homophily statistics of gender and sexual orientation estimated by RDS-II and Gile’s SS were relatively similar in five rounds (Range of population homophily statistics of nonbinary PWID: 0.9457–1.5342 for RDS-II and 0.9561–1.5166 for Gile’s SS; Range of population homophily statistics of bisexual PWID: 0.9132–1.1294 for RDS-II and 0.9140–1.1331 for Gile’s SS), while those statistics estimated by RDS-I and HCG were different (Range of population homophily statistics of nonbinary PWID: 0.6388–1.1627 for RDS-I and 0.1353–1.1020 for HCG; Range of population homophily statistics of bisexual PWID: 0.8020–1.2533 for RDS-I and 0.6590–1.1245 for HCG). However, four RDS estimators gave substantially different population homophily statistics of race/ethnicity in five rounds (Range of population homophily statistics of Black PWID: 1.3887–3.1280 for RDS-I, 1.4658–2.4335 for RDS-II, 1.4614–2.4592 for Gile’s SS, and 1.6246–2.4728 for HCG). Additionally, the recruitment patterns (i.e., average degrees) between Black PWID and Non-Black PWID and between Straight PWID and Non-Straight PWID in each round were also different, regardless of RDS estimator type.

Table 2 Population homophily and differential recruitment statistics using multiple RDS estimators, Newark, NJ, 2005–2018

Comparison of Time-sensitive Population Parameter Estimates

We would like to reiterate that self-reported HIV status was used as HIV status in the round of 2005–2006 given an HIV test was not provided. This may bias the comparison between the round of 2005–2006 and other four rounds.

Parameters of age, education, employment status, marital status, and HIV status estimated by RDS-II were cross-sectionally consistent with those parameters estimated by Gile’s SS in each round (See Table 3 and Supplementary Fig. 4 to 8). Most of parameters estimated by HCG were consistent with those parameters estimated by RDS-II and Gile’s SS, except for HIV status in the round of 2012 (See Fig. 2). In 2012, the proportion of PWID with negative HIV status was estimated to be 0.1263 (95% CI: 0.0417, 0.2109) by using HCG estimator, while proportions of PWID with negative HIV status estimated by RDS-II and Gile’s SS were 0.8673 (95% CI: 0.8076, 0.9271) and 0.8679 (95% CI: 0.8086, 0.9272) respectively. Compared to RDS-II and Gile’s SS, RDS-I had significantly different estimates of education, employment status, and marital status in each round.

Table 3 Estimated proportion of time-sensitive demographics using multiple RDS estimators, Newark, NJ, 2005–2018
Fig. 2
figure 2

Estimated proportion of time-sensitive demographics using multiple RDS estimators, Newark, NJ, 2012

Discussion

This study found that longitudinal and cross-sectional consistency existed for most of the time-insensitive population demographic (gender and sexual orientation) and population homophily statistics (gender and sexual orientation) estimated by RDS-II and Gile’s SS with five repeatedly collected RDS samples among PWID in the Greater Newark Area, New Jersey. Additionally, time-sensitive population parameters (age, education, employment status, marital status, and HIV status) estimated by these two RDS estimators in five rounds were also cross-sectionally consistent with each other. Such consistencies were not fully observed in the above-mentioned population parameters estimated by RDS-I and HCG.

Different from Burt and Thiede’s study that they compared RDS-I-adjusted estimates of race/ethnicity between NHBS-PWID Round 1 and Round 2 in Seattle and did not find differences in race/ethnicity, [6] disparities were observed in population parameters of race/ethnicity estimated by four RDS estimators (RDS-I, RDS-II, Gile’s SS, and HCG) in this analysis. Given that population homophily statistics of race/ethnicity were evidently away from one (population homophily statistics as one means no population homophily exists; population homophily statistics either evidently larger or less than one means population homophily exists), such racial disparities may be explained by the inability of the four RDS estimators to reduce biases caused by homophily. However, RDS-II and Gile’s SS reduced differences of proportions of race/ethnicity in five rounds of NHBS-PWID Cycles in New Jersey.

Even though HIV status was time-sensitive, proportions of HIV status estimated by RDS-I, RDS-II, and Gile’s SS did not change much from 2005 to 2018, in contrast to Khatib et al’s study of two RDS samples among men who have sex with men (MSM) in Unguja, Zanzibar which found a nearly five-fold reduction in RDS-I-adjusted HIV prevalence within 4 years [27]. This difference may be due to different dynamics and recruitment patterns between PWID in New Jersey and MSM in Unguja, Zanzibar.

Unlike Fellow’s study with simulated data to prove the robust performance of HCG under the presence of homophily, [15] this study showed that HCG-estimated parameters of race/ethnicity, sexual orientation, and HIV status were inconsistent. There were racial homophiles in five rounds of NHBS-PWID Cycles, but HCG did not reduce homophily biases well to give consistent estimates of race/ethnicity. Besides that, HCG also did not give reasonable estimates of sexual orientation and HIV status. For instance, HCG estimated the proportion of straight PWID in 2009 as 0.1748 and the proportion of PWID with negative HIV status in 2012 as 0.1263, while the sample proportion of straight PWID in 2009 was 0.8824 (375/425) and the sample proportion of PWID with negative HIV status in 2012 was 0.8796 (402/457).

This study has several limitations. First, the five rounds NHBS-PWID data that we used to assess the consistency of RDS estimators were collected with a three-year time interval between each round, which may bias our results of longitudinal comparisons of population demographic estimates. Observed inconsistencies in time-insensitive demographics estimated by four RDS estimators may still be influenced by time effect and natural movement of injection drug users in New Jersey. Second, because the population size of PWID in Newark, New Jersey is unknown, the estimated population sizes of five rounds used in Gile’s SS estimating function and HCG estimating function may be inaccurate. However, sensitivity analyses supported the robustness of Gile’s SS (See Supplementary Fig. 9), while HCG did not perform well in the round of 2009 (See Supplementary Fig. 10). Third, relatively small sample sizes of some categories of variables used in the study may affect the accuracy of population parameter estimates, population homophily statistics and differential recruitment statistics of those categories (e.g., nonbinary, student). However, given that our goal was to assess consistency of RDS estimators rather than to estimate accurate population parameters and sizes of those categories in five rounds were similar to each other, such limitation may not influence our findings. Fourth, population parameters estimated by four RDS estimators in five rounds may be biased by the non-randomness of initial sample (seeds). Even though convergence plots of five rounds showed that many final RDS estimates were not likely to be biased by the initial convenience sample of seeds (See Supplementary Fig. 11 to 30), some final RDS estimates (e.g., race/ethnicity, age, education in the round of 2009 and the round of 2012) did not converge well. However, sensitivity analyses that compared time-insensitive demographic estimates with and without seeds supported the consistency of RDS estimators except for RDS-I (See Supplementary Fig. 31 to 34). Fifth, the number of coupons (i.e., 3–5) may bias the need for a limited recruitment quota applied consistently to all participants. However, due to secondary analyses, we could not address such bias in this study. Sixth, the comparison of HIV status between the round of 2005–2006 and other four rounds may be biased given that self-reported HIV status was used as HIV status in the round of 2005–2006. However, cross-sectional comparison of HIV status in five rounds and longitudinal comparisons of HIV status in other four rounds were not affected. Seventh, some measures (e.g., network size) in our study may be inaccurate. Future studies may consider optimally defining these measures to better control biases. Eighth, even though sample quality was not assessed in this analysis, previous study showed the reliability of five RDS samples of NHBS-PWID in New Jersey [17].

Conclusions

In conclusion, even though RDS estimators may not address all inconsistencies, adequate consistency was observed in RDS-II and Gile’s SS. Future studies using RDS may need to conduct necessary diagnostic procedures during the stage of data collection, to test important assumptions of RDS, and to weight RDS samples with more adjustment approaches. Based on this study, RDS-II and Gile’s SS are recommended to weight RDS samples for potentially more robust results, if comprehensive comparisons and sensitive analyses cannot be performed.