Introduction

The global scale-up of antiretroviral therapy (ART) for HIV-1 infections has resulted in considerable reductions in HIV/AIDS-related morbidity and mortality [1,2,3]. However, drug resistance can hamper virological suppression in patients who receive treatment. After decades of ART scale-up, the increasing HIV drug resistance in low- and middle-income countries, such as China, poses a growing threat [4].

The Chinese government officially launched the National Free Antiretroviral Treatment Program (NFATP) in 2003. By the end of 2015, approximately 70% of individuals in this country living with HIV/AIDS received ART [5]. Nearly half of ART failures in China are reported to be associated with drug-resistance-associated mutations (DRMs) [6, 7]. Primary infection with drug-resistant HIV strains limits a newly diagnosed patient’s treatment options and is associated with treatment failure. Previous studies have shown that China had a prevalence of HIV with transmitted drug resistance (TDR) of approximately 3.6% in ART-naive patients in 2015 [8, 9], which was lower than that in Western countries [10,11,12,13]. However, given the large number of HIV-infected people and the persistent circulation of HIV in China, the number of patients for whom ART failed because of drug resistance is considerably high in this country.

To achieve the UNAIDS’s goal of ending the AIDS pandemic by 2030, ever-increasing numbers of people will be enrolled in the ART program in the coming years. Thus, with the quick scale-up of treatment and the passage of time, drug resistance will require more attention. Little is known about the temporal trend of HIV with TDR or its transmission characteristics in China.

In this study, we examined the changing prevalence of HIV with TDR from 2000 to 2016 in ART-naive patients in China, utilizing pol gene sequences of HIV extracted from the Los Alamos HIV Sequence Database. We performed comprehensive phylogenetic analysis to explore the transmission characteristics of drug-resistant HIV strains. In addition, we explored the evolutionary characteristics of the largest transmission clusters by applying Bayesian coalescent-based methods.

Materials and methods

Data collection

We retrieved all available records of HIV-1 sequences from China containing the pol gene region from the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov). A total of 12,447 sequences were obtained along with basic information, including the GenBank accession number, subtype, sampling location, collection date, and ART status, if provided. For sequences without a denoted ART status, we collected information for each sequence from the original source publication. The pol gene region was extracted from these data according to the annotation information, with all sequences < 500 nt or for which the ART status was “yes” or “unknown” removed. For patients with multiple sequences, only the earliest sequence was retained.

Sequences were aligned using MAFFT (version 7) [14] and manually checked and edited using MEGA (version 7) [15]. All of the sequences were selected manually to maximize the length and the number of segments for analysis. Segments spanning 1053 nt of the pol gene, including the entire protease (PR) and partial reverse transcriptase (RT) regions (HXB2 nucleotide coordinates 2253–3305), were ultimately selected.

Drug resistance mutation analysis

The Stanford Calibrated Population Resistance Tool version 7.0 (https://hivdb.stanford.edu/) was used to screen the pol sequences for surveillance drug resistance mutations (SDRMs), according to the World Health Organization (WHO)-recommended surveillance drug resistance mutation list, which was updated in 2009 for nucleoside reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase inhibitors (NNRTIs), and protease inhibitors (PIs). HIV with TDR was defined as an HIV strain containing at least one of the standard surveillance drug resistance mutations listed by the WHO.

Transmission clusters and network construction

The transmission network was built as described previously [16]. Briefly, an approximately-maximum-likelihood phylogenetic tree was estimated for all sequences using the GTR + G + I nucleotide substitution model in FastTree 2 [17] after excising 43 codons associated with drug resistance from the alignment. The Tamura-Nei 93 (TN93) pairwise genetic distances (PGDs) were calculated for all pairs of sequences to identify potential transmission partners. Transmission clusters (TCs) were defined as sequences for which the bootstrap support value was greater than 0.9 and the PGD was less than 3.0%, and these were employed to detect longer-term transmission events. TCs were extracted from the phylogenetic tree using Cluster Picker [18]. In an undirected tree (extracted TC), there are multiple sub-trees that could be used to construct the transmission network. To select the best tree with the minimized sum of the edge weight (TN93 PGDs), a minimum genetic distance algorithm was used to define the network structure of a cluster [19]. To visualize and analyze networks, the network data were processed in the R software package utilizing the ape (version 5.4-1) [20] and qgraph packages (version 1.6.5) [21].

Bayesian phylogeographic inference

We used BEAST (Bayesian evolutionary analysis sampling trees, version 1.10.4) [22] to conduct the Bayesian phylogeographic inference for large HIV with TDR clusters after removing all of the drug-resistance-associated sites from the alignment. All of the clusters were analyzed using a suitable nucleotide substitution model (HKY for M46L clusters, JC for M46I clusters), which was selected by jModelTest (version 2.1.10) [23]. We used a strict molecular clock model to infer the timescale of HIV evolution. A Bayesian skygrid coalescent tree was used to estimate the effective infection population size. A continuous-time Markov chain (CTMC) model was used to estimate the transition rates between each pair of locations, and the Bayesian stochastic search variable selection (BSSVS) procedure was applied to identify well-supported transitions and infer the location state of the ancestral nodes of the trees.

Markov chain Monte Carlo (MCMC) analyses were run for 100 million generations, with parameter values sampled every 10,000 steps. The resulting log files were imported into the program TRACER v1.7.1, and the first 10% of the output was discarded as burn-in from each run. The convergence of the estimates was evaluated using generation vs. log probability plots in TRACER, using an effective sample size >200. When reporting the substitution rate and the time of the most recent common ancestor (tMRCA), we used the mean of the estimate parameter. The Bayes factor (BF) was calculated using SpreaD3 [24] to identify well-supported transitions by applying a cutoff of 3.0. TreeAnnotator was used to generate a summary maximum-clade-credibility (MCC) tree from the posterior distribution of the trees (after removal of the MCMC burn-in of 10%).

Statistical analysis

The temporal trends of the prevalence of HIV with TDR were evaluated with odds ratios (ORs) and 95% confidence intervals (CIs) using a logistic regression model. An OR > 1 indicates an increase in the prevalence of HIV with TDR over time, and an OR < 1 indicates a decrease over time. Furthermore, we examined the trend in a smaller window size, using a local weighted regression model with a reciprocal of the standard error of prevalence set as the weight. Fischer’s exact test was used for the analysis of differences between proportions, and Student’s t-test was used for the analysis of differences between means. Transmission fitness was defined as the ability of a pathogen to be transmitted and cause secondary cases of the disease as a means to describe the fitness of the virus at the population level [25]. To evaluate the transmission fitness of each TDR mutation, we calculated the ORs and 95% CIs from the clustering frequency by comparison with wild-type strains using a logistic regression model. An OR > 1 indicates positive fitness, and an OR < 1 indicates negative fitness. Statistical significance was defined as a p-value < 0.05. All analyses were carried out with the R program (version 3.5.3) [26].

Results

Characteristics of the study dataset

Ultimately, a total of 6578 HIV-1 pol gene sequences from treatment-naive patients from 24 provinces in China between 2000 and 2016 were obtained. Of these patients, 5155 were male (78.37%) and 313 were female (4.76%). The patient characteristics are shown in Table 1. The distribution of sequences in regions from each year is displayed in Supplementary Fig. S1A. The largest group of patients (3067, 46.6%) were infected with CRF01_AE, followed by CRF07_BC (2013, 30.6%) and subtype B (836, 12.7%) (Table 2). The remaining patients were infected with other pure subtypes or circulating recombinant forms (CRFs, 662, 10.1%). Subtype B infections decreased from 34.1% to 11.2% from 2000 to 2016 (p < 0.001). CRF01_AE and CRF07_BC infections increased over time (29.5% to 47.1%, p < 0.001; 11.4% to 31.6%, p = 0.0229) (Fig. 1). The infection routes constituted 4414 (67.1%) men having sex with men (MSM), 654 (9.9%) heterosexual sex, 622 (9.5%) injected drug use (IDU), and 888 (13.5%) other infection routes or data unrecorded (Table 2). The composition of infection routes of all HIV-infected patients in regions is displayed in Supplementary Fig. S2A.

Table 1 Characteristics of HIV-1 sequences in this study
Table 2 Transmitted drug resistance and rates of phylogenetic clustering in subtypes and risk groups
Fig. 1
figure 1

Yearly proportional change in HIV-1 subtypes from 2000 to 2016

Prevalence and temporal trend of transmitted drug resistance

HIV with TDR was demonstrated in 274 (4.51%) patients in the period from 2000 and 2016. Supplementary Table S1 shows information for all of the HIV-1 sequences with TDR mutations. The distribution of TDR mutated sequences and TDR prevalence in regions from each year are displayed in Supplementary Figures S1B and S3. Mutations conferring resistance to NRTIs, NNRTIs, PIs, and multiclass drugs were found in 92 (1.40%), 100 (1.52%), 123 (1.87%), and 37 (0.56%) patients, respectively. The prevalence of TDR in both CRF01_AE (5.31%) and subtype B (6.58%) was higher than that in the other subtypes (Table 2). Among the subtypes, CRF01_AE had the highest prevalence of PI-TDR, while subtype B had the highest prevalence of NRTI- and NNRTI-associated TDR. Among the different infection route populations, the prevalence of HIV with TDR in heterosexual sex individuals was relatively high (7.34%). MSM had the highest prevalence of PI-TDR among the risk groups, while the prevalence of NNRTI-TDR in heterosexuals was higher than that in the IDU and MSM groups (Table 2). The composition of the infection routes of patients with TDR HIV in regions is shown in Supplementary Fig. S2B (Fig. 2).

Fig. 2
figure 2

Prevalence of transmitted drug-resistance mutations in HIV-1 subtypes according to drug class

The prevalence of drug-resistance mutations in all sequences is shown in Table 3. Overall, the most frequent mutation was M46L, which was found in 0.89% (n = 58) of patients, followed by K103N (0.55%, n = 36), M46I (0.55%, n = 36), and M184V (0.40%, n = 26). Furthermore, in patients infected with strain CRF01_AE, the most frequent mutation was M46L (1.76%), followed by M46I (1.04%), M184V (0.46%), and K103N (0.39%). The most frequent TDR mutation in patients infected with CRF07_BC was M184V (0.39%), followed by T215S (0.20%) and Y181C (0.15%). The most frequent TDR mutation in subtype B was K103N (2.27%), followed by G190A (0.96%), Y181C/Y188L/M41L/D67N (0.84%), and M184V (0.72%). The prevalence of TDR mutations in each HIV-1 subtype is shown in Fig. 2.

Table 3 Frequency, clustering rate, and relative fitness of transmitted drug resistance mutations in treatment-naive patients

The prevalence of HIV with TDR varied throughout the study period. It was 2.73% in 2011–2012 and 9.09% in 2000–2003, with no clear trend observed (OR = 0.94, 95% CI 0.86–1.04, p = 0.233). The prevalence of NRTI (OR = 0.76, 95% CI 0.66–0.89, p <0.001) and NNRTI-associated (OR = 0.84, 95% CI 0.73–0.98, p = 0.020) TDR decreased slightly from 2000 to 2016, while the prevalence of PI-associated (OR = 1.09, 95% CI 0.95–1.27, p = 0.233) TDR was stable during the same period.

However, when we examined the trends in a smaller window, the prevalence of total TDR was found to decrease from 2000 to 2010 (OR = 0.83, 95% CI 0.73–0.95, p = 0.006), but it increased after 2010 (OR = 1.50, 95% CI 1.13–1.97, p = 0.004). The prevalence of NNRTI-associated TDR decreased from 2000 to 2010 (OR = 0.77, 95% CI 0.64–0.94, p = 0.007) and increased after 2010 (OR = 1.91, 95% CI 1.13–3.18, p = 0.014). PI-associated TDR was found to increase slightly after 2006, with no statistical significance, but it increased after 2010, with marginal significance (OR = 1.44, 95% CI 0.99–2.09, p = 0.055) (Fig. 3).

Fig. 3
figure 3

Number distribution (A) and temporal trends in the yearly proportion (B) of individuals with transmitted HIV drug resistance by drug class. Note that a local weighted regression was applied to the prevalence to smooth the curves. Abbreviations: NNRTI, non-nucleoside reverse transcriptase inhibitor-associated drug resistance; NRTI, nucleoside reverse transcriptase inhibitor-associated drug resistance; PI, protease inhibitor-associated drug resistance; Any class, any class of transmitted drug resistance

Clusters and transmission fitness of TDR mutations

Finally, Cluster Picker identified 788 transmission clusters containing 3189 of 6578 patients (48.4%). Strains CRF01_AE and CRF07_BC had higher rates of phylogenetic clustering than the other subtypes, while the rate of clustering was higher in MSMs than in the other infection route groups (Table 2).

Of these 788 clusters, 69 clusters had one or more sequences with TDR mutations, comprising 111 of 274 (40.5%) TDR sequences. The network construction of these 69 transmission clusters is shown in Supplementary Fig. S4. The clustering rate of sequences with TDR increased annually from 27.3% in 2005–2006 to 63.6% in 2015–2016 (p < 0.001). Of these 69 TDR clusters, 52 clusters had only one TDR sequence, 10 clusters had two TDR sequences, and 7 clusters had three or more TDR sequences. Among the 17 clusters with at least two TDR sequences, 15 clusters had the same TDR mutation, seven of which were M46L, followed by M46I (n = 2).

Overall, the sequences with TDR were less likely to form phylogenetic clusters than the sequences not resulting in TDR, even after adjusting for subtype and at-risk populations (40.5% vs. 48.8%, p = 0.023). The clustering rates of NNRTI- and NRTI-associated TDR mutations were significantly lower than those of the wild-type sequences (Table 3). The mutation M46L had the highest frequency in the clusters (n = 32), followed by M46I (n = 26), L210W (n = 7), M184V (n = 7), T215S (n = 5), and K103N (n = 4).

Of the 58 TDR mutations, seven had clustering frequencies that deviated significantly from the neutral expectation, most of which resulted in a significant reduction in transmission fitness (NNRTI and NRTI mutations: K103N, G190A, Y181C, K101E). Conversely, M46I had a relative fitness significantly greater than 1.0 (Table 3).

The spatiotemporal origins and epidemic dynamics of large TDR clusters

There were four large HIV with TDR clusters containing ≥ 5 sequences. All of these sequences were from patients infected with strain CRF01_AE in the MSM risk group. We extracted the two largest clusters containing the mutants M46I (cluster 300) and M46L (cluster 67) to implement Bayesian phylogeographic inference. The time of the most recent common ancestor (tMRCA) of cluster 67 and cluster 300 was 2007.2 (95% HPD 2004.1–2008.5) and 2006.9 (95% HPD 2004.1–2008.5), respectively. The substitution rates of clusters 67 and 300 were 2.144 × 10−3 (95% HPD 1.075×10-3–3.213 × 10-3) and 2.933 × 10-3 (95% HPD 1.385 × 10-3–4.873 × 10-3), respectively. From the phylogeographical analysis, the potential transmission source locations for clusters 67 and 300 were Beijing (posterior probability [PP] = 0.80) and Liaoning (PP = 0.69), respectively (Fig. 4).

Fig. 4
figure 4

Molecular dating analysis of large HIV-1 clusters with evidence of transmitted drug resistance. The phylogenies of the (A) M46L and (B) M46I clusters are colored according to the sampling location. The posterior probabilities of the potential transmission sources of clusters are annotated on the right of the node.

Interestingly, two other clusters, 65 and 68, together with clusters 62, 63, and 69, were adjacent to cluster 67 in the phylogenetic tree, and most of them shared the same TDR mutation, M46L. Analysis of the sequences from these clusters showed that the TN93 pairwise genetic distances were < 3.0% for the majority of the sequences (Supplementary Fig. S5). Therefore, we applied the same analyses to these clusters with the tMRCA estimated at 2002.9 (95% HPD 2000.4–2004.8) and its substitution rate estimated at 2.087 × 10-3 (1.467 × 10-3–2.818 × 10-3) while Beijing was estimated as the source location with the highest support (PP = 0.99) (Fig. 4). Analysis of the geographical migration as estimated from the Bayes factors indicated that the migration pathways, from Beijing to Hebei, Hubei, Hunan, and Guangdong, and from Hubei to Hunan and Shanghai, were all well supported for the M46L clusters, while in the large M46I cluster, viral migration from Beijing to Liaoning was well supported. Table 4 illustrates the best-supported pathway of migration of the large M46I/L clusters in China.

Table 4 The best-supported pathway of migration of the large M46I/L clusters in China

Discussion

In the present study, we identified transmitted drug-resistant HIV in ART-naive HIV-1-infected individuals in China. The overall prevalence of HIV with TDR was 4.51%, which is comparable to that reported by Zhao et al. and Su et al. (3%–5%) [8, 9]. It is not surprising that the prevalence of HIV with TDR in China was lower than that in Europe (9.4%) and North America (11.5%), where ART was initiated much earlier [27].

The prevalence of HIV with TDR varied throughout the study period and ranged from 2.73% in 2011–2012 to 9.09% in 2000–2003. The higher prevalence of HIV with TDR in the earlier phase of the NFATP may be due to the widespread use of generic agents, including zidovudine, didanosine, and nevirapine [28]. The decreasing trend in HIV with TDR from 2000 to 2010 mainly contributed to the decrease in NRTI-related resistance mutations (Fig. 3B). We speculate that the decline of the prevalence of HIV with TDR could be partially attributed to the antiretroviral drugs that were newly available after 2004, such as lamivudine and efavirenz. The toxicity and side effects of lamivudine are lower than those of didanosine, which improved patient compliance and provided better therapeutic effects [29]. Accordingly, lamivudine was used in the following years [30]. In addition, to guide the national free ART Program, four editions of the China Free ART Manual were released from 2004 to 2016. Because of standardized treatment guidance, increased availability of effective drugs, and improved patient compliance with the NFATP, the virological suppression rate increased from 78% in 2004–2005 to 90% in 2013 [8], and the prevalence of HIV with TDR declined accordingly.

However, after 2010, a slight increase in HIV with TDR prevalence was observed in our study. A similar trend between 2009 and 2012 was also observed by Su et al. [9]. The authors attributed this phenomenon to the broad expansion of access to ART during this time period and/or to the cumulative increase in prevalence of drug-resistant strains in HIV-infected patients at the onset of therapy. However, when we looked closely at the factors driving the increase, we found that PI-resistant strains were the main contributors to the growth trend since 2010. Furthermore, in both the study by Su et al. and this study, one noteworthy finding was the elevated prevalence of PI-specific TDR in the MSM group. MSM have become the most predominant risk group for HIV-1 infection in the last decade [31]. Accordingly, the high prevalence and an increasing trend of PI resistance in this population will pose a significant challenge to the prevention and control efforts in a large number of HIV-infected MSM in China.

The most frequent TDR mutation in China was M46I/L, whereas in Europe and North America, it was K103N [12, 32,33,34]. This difference might largely be attributed to the dominant HIV-1 subtypes that circulate between regions. In China, the predominant subtype of HIV-1 is CRF01_AE, in which most of the M46I/L substitutions were observed. In Europe and North America, subtype B is the most prevalent. The second-most frequent TDR mutation was K103N, which represents the most frequent TDR mutation in subtype B. This is consistent with what has been observed in European and North American isolates. Furthermore, the TDR rates overall and in NRTI/NNRTI in subtype B were higher than those in other subtypes. Subtype B was the predominant strain of the early HIV epidemic in China [35]. Therefore, it had a more prolonged exposure to ART and consequently accumulated drug resistance mutations, leading to this relatively high TDR prevalence.

In this study, we found that TDR mutant was associated with reduced transmission clustering rate, suggesting that TDR strains have lower fitness than the corresponding wild-type strains. Drug-resistance mutants can revert to wild type and then outcompete strains with the DRMs. In addition, viral competition experiments have demonstrated that DRMs generally reduce the replication rate of HIV (i.e., replicative fitness). When transmission occurs, systemic infection is typically established by a single genetic variant, and founder virus selection is biased toward certain genetic characteristics [36]. This genetic bottleneck could limit the transmission of DRMs. In the absence of antiretroviral therapy, strains containing DRMs commonly have reduced fitness compared with the wild-type strain within the population [3]. Therefore, TDR-associated strains are likely to be less active in onward transmission. This transmission disadvantage could explain the reduced clustering rate of TDR strains compared with wild-type strains. Despite this negative impact of TDR on transmission, the clustering rate of TDR strains remained at 40.5%, showing that the transmission clusters made a significant contribution to the spread of TDR. More importantly, we found that the clustering rate of TDR strains increased annually after 2004, indicating that the contribution of transmission clusters to the spread of TDR has been increasing.

In contrast, people infected with a virus containing the mutation M46I were involved in increased rates of transmission. The enhanced transmission of M46I strains is noteworthy, as it has implications for second-line regimens containing lopinavir or atazanavir in resource-limited countries [37]. Another mutation at the same site, M46L, is also noteworthy, as it had the highest prevalence in the study, with a high rate of clustering. Using Bayesian phylogeographic models, we found that Beijing was a critical transmission location for M46I/L. Interestingly, all of the sequences of M46I/L were derived from CRF01_AE-infected MSM patients. Considering the high prevalence of CRF01_AE in China and the increasing proportion of newly diagnosed HIV-infected patients within MSM populations [31], concern about the rapid onward transmission of the M46I/L variant within the Chinese MSM community is warranted. Given that M46I/L is one of the most commonly transmitted PI mutations globally, especially in South/Southeast Asia [27], there is an urgent need to better understand its clinical impact on the efficacy of the current second-line therapy, as well as its effects combined with other mutations.

Some limitations should be noted here. First, because data on the infection history and disease progression were limited, a proportion of the individuals might not have been infected recently. The overall trend of many TDR-associated mutations is toward a gradual reversion to the wild type in the years following the initial infection. Therefore, the drug-resistant strains in plasma would become minor quasispecies after a period of infection. Accordingly, the prevalence of HIV with TDR in chronically infected individuals is generally lower than that in acutely infected individuals, which would have impacted the estimated TDR rate determined in this study. Second, the population heterogeneity in the study was noteworthy. Due to the strict filter condition, the number of eligible sequences from females was relatively small, and the majority of the sequences were from MSM. Nevertheless, no significant difference in clustering rates was detected between females and males, either in heterosexual sex or IDU (data not shown). Therefore, we adjusted the infection route in the subsequent transmission analysis. Third, like all genetic clustering studies, we were unable to determine whether the genetic linkage results were due to direct or indirect transmissions.

In conclusion, the overall prevalence of HIV with TDR among pretreatment HIV-infected individuals in China has remained at a low level during the last two decades. However, the onward transmission of drug resistance mutations may compromise the efficacy of ART programs. We also found self-sustaining reservoirs of TDR mutations among transmission networks that may have important implications for the success of ART programs. We thus suggest that effective measures are still needed to strengthen monitoring and guide ART usage, such as drug-resistance testing for patients enrolling in ART programs.