Keywords

FormalPara Core Message

Injection drug use is a major risk behavior associated with transmission of HIV-1B.

Sociocultural and socioepidemiological studies are required to characterize networks of injection drug users (IDUs).

The possibility of links between subject socioepidemiology and viral sequence diversity, phylogenetic relationships, signatures, thermodynamics, and glycosylation patterns is studied.

This chapter addresses whether in risk locales where many people inject together, there are variations in probability of relatedness of HIV-1B env sequences.

15.1 Introduction

What became known as HIV-1 was discovered and first isolated in three different laboratories, namely of Drs. J. Levy, L. Montagnier, and R.C. Gallo [1,2,3,4]. The complex distribution of human immunodeficiency virus type 1B (HIV-1B) genotypic variants, or quasispecies, within infected individuals has been characterized extensively [5]. There are many potential routes of HIV-1B transmission, including parenteral (injection drug use, injection drug user (IDU), heterosexual and homosexual sexual transmission), blood transfusion, and perinatal (prepartum and postpartum). Upon transmission, some studies indicate that the major variant found in the donor is transmitted [6], while others report transmission of a minor variant [7,8,9]. For either alternative, the person newly infected through these routes has an HIV-1B sequence population that is initially homogeneous [7, 10, 11], irrespective of the heterogeneity of the donor’s sequences. With time, the HIV-1B high mutation rate and selective pressures generate heterogeneous populations, quasispecies, or sequence variant clouds [8, 12,13,14,15,16,17,18,19].

Sequence heterogeneity in the envelope region occurs primarily in the hypervariable domains, designated V1–V5, which has attracted much attention for characterizing genotypic and phenotypic variants [20]. Because the env gene is particularly prone to mutation [21], it has proven invaluable in molecular epidemiological studies tracing the patterns of disease transmission and progression [9]. Several analytical approaches have been implemented using different variable domains of the HIV-1B ENV gene to investigate viral transmission: genetic distances, phylogenetic trees, and sequence signature patterns. Phylogenetic analysis has been used to establish the likelihood of HIV-1B transmission from infected health care workers to patients [11, 14]. Only recently, has such analysis been used to determine whether different transmission groups possess characteristic variants [22,23,24]. Risk group-associated variations in sequence signatures have been described between drug user, transfusion, and homosexual/hemophiliac DNA sequences within the V3 loop domain [25, 26]. Mother-infant transmissions reveal that the V3 sequences are much more heterogeneous in the mother than in the infant after birth. The amino acid sequence signatures correlated with linked vertical transmission between these pairs [27]. Earlier in the epidemic, two predominant genotypes of HIV-1B (Thai A and B) were known to exist in Thailand [28]; interestingly, they showed in studies of the V3 loop region that there had been independent introduction of the virus into two high-risk populations distinguished by mode of transmission (sexual activity vs intravenous drug use ). Characterization of recombinant segments in the C2–V3 region demonstrated transmission between both individuals within a sexual risk couple [7]. In addition, dual transmission of HIV-1B and C subtypes was detected in a family, husband to wife to child [29].

Injection drug use is a primary risk behavior associated with transmission of HIV [30,31,32,33,34,35,36,37,38,39,52]. Many sociological and epidemiological characterizations of drug abuse and HIV infection networks include NIDA. There have been many detailed studies characterizing the IDU and sexual transmission of HIV-1B and other epidemiologically linked viruses including HCV. Detailed molecular studies identified and tracked virus strains in well-defined drug abuse networks as part of sociological and epidemiological studies.

One factor that complicates deducing HIV infection-related risk networks is the infectivity potential of virus strains . This involves inferring the probability that various virus strains infect new hosts after each single exposure due to use of contaminated drug injection needle-syringe, cotton, cooker, and washwater. The CDC reports that the average probability of HIV infection is 0.33 per 100 needlestick or cut exposuresFootnote 1. Consequently, the establishment of risk networks in which a single V1–V5 HIV-1B strain variant dominates is unlikely. There is also a lowered chance of transmission of any given strain, especially in circumstances of multiple IDUs injecting in the same risk locale. However, long-term intimate relations among socially characterized participants warrant a higher likelihood of exhibiting detectable clustering of HIV strains with increased sequence relatedness than diffusely distributed networks [53,54,55,56,57].

In this study, we characterized and defined the sociodynamics of four networks of seropositive IDUs . In addition, we characterized the viral sequence within specific hypervariable domains (V1–V5) of 37 HIV ENV genes derived from the IDUs.

15.2 Materials and Methods

15.2.1 Network Epidemiology

Epidemiological methods for assessment and outreach of individuals who inject drugs included several approaches and protocols. Four epidemiological networks linked by sexual interactions and/or IDU behavior were studied (Table 15.1). Fieldwork with questionnaires was used to investigate injection drug use in Miami. Factors including frequency of participation in shooting galleries, high-risk contacts, injection drug use, sex, gender, age, and mobility were characterized to elucidate the dynamic temporal and spatial relationships between risk and network members [30, 36, 37, 42,43,44,45,46, 48, 58, 59]. Confidentiality was strictly maintained throughout this work. The laboratory studies were all done devoid of any personal identifiers. The Internal Review Board (IRB) rules of the University of Miami were all maintained and strictly enforced. Human subject approvals were obtained at the time of these studies from the IRB at the University of Miami.

Table 15.1 Patient HIV-1B IDU demographics

15.2.2 Specimens

Blood was obtained from 15 heterosexual HIV-positive subjects (Tables 15.1 and 15.2). Blood samples were obtained for individuals in networks 1, 2, and 3 at the same initial time, whereas those in network 4 were obtained 14 months later. Follow-up samples were subsequently obtained 12 months afterward for members of the first three networks. Blood was obtained from study individuals using standard EDTA-containing vacutainer tubes. Peripheral blood lymphocyte (PBMNC) pellets were produced using Ficoll–Hypaque density gradient centrifugation and were cryopreserved at −85 °C until needed [15,16,17, 19].

Table 15.2 Patient networks (15 participants, 4 networks)

15.2.3 Polymerase Chain Reaction

Genomic DNA was extracted from the cryopreserved PBMC pellets using DNA isolation kits (United States Biochemicals, Cleveland, OH) and subsequently precipitated with ethanol and resuspended in Tris-EDTA (THE) buffer (pH 7.5). The V1–V5 region of the HIV-1B ENV gene was amplified using two rounds of PCR with a set of nested primers as previously described [19]. For the first PCR, we used approximately 1 mg genomic DNA in a 100 ul reaction containing 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 2.5 mM MgCl2, 20 nmoles (200 mM) each of dNTPs, 2.5 U AmpliTaq DNA polymerase (Perkin Elmer, Foster City, CA), and primers XPR1 (GGGATCAAAGCCTAAAGCCA, sense, nucleotide positions 6558–6577 in HXB2) and XPR7 (ACTTCTCCAATTGTCCCTCA, antisense, positions 7647–7666). Amplification consisted of a hot start (95 °C, 5′), then 30 cycles of 94 °C, 0′35″; 60 °C, 1′35″; and 72 °C, 2′35″, followed by a final extension (72 °C, 10′). For the nested PCR reaction, primers XPR2 (GAATTCACCCCACTCTGTGTTA, sense, positions 6590–6605) and XPR6 (AAGCTTCTCCTCCAGGTCTGA, antisense, positions 7625–7639) were used to amplify 1 ml of the first PCR reaction under the same conditions as above. The size of the PCR amplicon was determined by analysis on 1.2% agarose gel-1X TAE in 0.5-mg/ml ethidium bromide. Typically, a single PCR band was observed using only 10 ul of the nested PCR reaction.

Negative controls consisted of a water sample (instead of peripheral blood mononuclear cells), RT reaction mixtures without added reverse transcriptase, and reagent controls run in parallel with the tested samples [15,16,17, 19].

To minimize the possibility of carryover contamination, separate rooms (neither connected by airflow nor by air conditioning) were always used for the addition of reagents and sample DNA template vs performance of PCR and the handling of PCR products. Only one subject’s sample was ever handled at any time during processing as well as subsequent amplification and cloning steps. In addition, sentinel tubes were evaluated weekly for contamination. Several subjects had blood samples redrawn after 12 months so that cloned sequences could be compared to verify phylogenetic relationships and to rule out sample mix-up and/or contamination [15,16,17, 19]. Sequence integrity was analyzed – Rodrigo and Learn [60] produced a review of several methods.

15.2.4 Molecular Cloning and DNA Sequencing

The PCR product was purified and concentrated using the High Pure PCR Product Purification Kit (Boehringer Mannheim, Mannheim, Germany) and quantified by fluorescence (Hoefer Instruments, San Francisco, CA). The amplified product (40 ng) was subcloned into pCR2.1 vector using the TA cloning kit (Invitrogen, San Diego, CA). An aliquot of the ligation reaction (110 ng) was then used to transform INVaF’ cells. Plasmid DNA was isolated from positive clones, which were selected by kanamycin resistance and lacZa complementation (blue/white), using the Wizard Miniprep (Promega Corporation, Madison, WI). After digestion with EcoRI, the size of the cloned insert DNA was verified by agarose gel electrophoresis. Glycerol stocks of positive clones were prepared for long-term storage. A variety of sequencing primers both internal and external to the cloned fragment were used to sequence the primary and complementary strands (ACGT Inc., Northbrook, IL) [15,16,17, 19]. Clone designation: number designates subject ID; “L” designates blood, followed by clone number; and then “D” or “R” designates DNA or RNA nucleic acid source , respectively. The addition of “+12”, e.g., 1004 + 12L4D, designates follow-up samples, obtained 12 months after the initial samples.

15.2.5 Sequence Analysis

15.2.5.1 DNA and Protein Alignments

The env sequences of HIV-1B IDUs from two different risk locales such as Overtown and Opa-Locka, Dade County, Florida (Table 15.2), made a total of 37 nucleotide sequences, and their corresponding proteins were also used for sequence alignments. The alignment files were generated using Clustal X 1.83 [61] with default parameters. Both intra- and intermolecular sequence variations were analyzed using Clustal X. The sequence variations and consensus patterns (signatures) were displayed using Weblogo 2.8.2 http://weblogo.berkeley.edu/, [62].

15.2.5.2 Entropy Analysis

Shannon entropy [63] is a simple quantitative measure in uncertainty units in a dataset and was used to measure the variation in DNA and protein sequence alignments. Entropy calculation was done for each position of the input sequence set. It can be used as a measure of the relative variation in the regions of an aligned gene or protein. The online web server available at the Los Alamos National Laboratories (LANL) (Los Alamos, NM) (http://www.lanl.gov) and (https://www.hiv.lanl.gov/content/sequence/ENTROPY/entropy_one.html) were used.

15.2.5.3 Neutrality Hypothesis

The null hypothesis assumes that in most protein-coding genes, the number of synonymous nucleotide substitutions per site (dS) is equal to the number of non-synonymous nucleotide substitutions per site (dN), i.e., (H0: dN = dS), and the alternate hypothesis is (H1: dN ≠ dS). The probability (P) of rejecting the null hypothesis of strict neutrality (dN = dS) is less than 0.05, considered significant at the 5% level. This may favor the alternative hypothesis. The variance of the difference was computed using the bootstrap method with 500 replicates [64]. Analyses were conducted using the Nei-Gojobori method [65] in MEGA5 [66]. All positions containing gaps and missing data were eliminated from the dataset. Tajima’s test of neutrality [67] was conducted using MEGA5 to compare the number of segregating sites per site with the nucleotide diversity. An essential parameter in the theory of neutral evolution is “4Nu,” where “N” is the effective population size and “u” is the mutation rate per site. The difference in the estimate obtained provides an indication of non-neutral evolution. Codon-based Z-test was also conducted using MEGA5 to test whether positive selection is operating on a gene by comparing the relative abundance of synonymous and non-synonymous substitutions that have occurred in the gene sequences.

15.2.5.4 Estimates of Transition/Transversion Bias

Evolutionary analyses were conducted in MEGA5 [66]. Two statistical methods such as maximum likelihood (ML) and maximum composite likelihood (MCL) were used. Substitution pattern and rates were estimated for nucleotide sequences using Tamura-Nei model (+G) [68, 69]. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories, [+G], parameter = 0.6115). All positions containing gaps and missing data were eliminated. The analysis involved 37 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding.

15.2.5.5 Disparity Index Test

A simple statistic (disparity index test) measures the homogeneity of substitution patterns between a pair of sequences [70]. It works by randomly comparing the nucleotide (or amino acid) frequencies of the two descendent sequences and using the number of observed differences between them. MEGA5 computed the disparity index per site, which is given by the total disparity index between two sequences divided by the number of positions compared, excluding gaps and missing data. It is more powerful than a chi-square test of the equality of base frequencies between sequences (http://www.megasoftware.net/manual.pdf). The test was performed to infer substitution pattern homogeneity on pairwise nucleotide sequence comparisons. Monte Carlo simulations with 500 replicates were computed. All positions containing gaps and missing data were eliminated.

15.2.5.6 Molecular Relatedness and Phylogenetic Analysis

Neighbor-joining (NJ) method was used to generate phylogenetic trees. MCL method assumed substitutions included transitions and transversions. Substitutions among lineages were treated as uniform rates, and the pattern among lineages was set homogeneous. All positions containing gaps and missing data were deleted. Codon positions included were 1st + 2nd + 3rd + Noncoding [69].

The generated sequence alignment files were imported from Clustal X to SplitsTree4 version 4.13.1, built 16 Apr 2013 [71], to compute splits as well as to infer variations among the sequences. The nucleotide dataset (37 sequences, each 1086 bp) had 578 constant sites, 277 non-parsimony-informative sites, and 196 gapped sites (no missing data). Whereas the translated proteins of the nucleotide dataset (37 sequences, each 367 residues) had 157 constant sites, 127 non-parsimony-informative sites, and 71 gapped sites (no missing data). The distances were computed from characters (nucleotides) with default parameters, and then splits were generated from distances using neighbor-net [72] approach to produce a set of circular splits on the network as defined in split decomposition [73], and the amino acid distances used a neighbor-net variance approach performed using ordinary least squares.

15.2.5.7 Confidence in the Resulting Phylogenetic Trees

Confidence in the resulting phylogenetic trees was assessed using bootstrap analysis [64]. One thousand bootstrap replicates were generated to assess the reliability of each edge in the tree as well as the network.

15.2.5.8 Phi Test for Recombination

The characters were analyzed using Phi test [74] and found informative sites using window size of 100.

15.2.6 Nucleotide Sequence Accession Numbers

Sequences were submitted to GenBank and the Los Alamos National Laboratory HIV sequence database, and the accession numbers obtained are KT984127–KT984163 (Table 15.3).

Table 15.3 HIV-1B ENV clones (37 sequences)

15.3 Results

15.3.1 Socioepidemiology

The four IDU networks consisted of (i) a male/female dyad (subjects 1002/1001) with sexual relations and shared IDU habits for 4 years; (ii) two separate male/female dyads each of whom maintained sexual relationships for 13 years (subjects 1004/1003) and 8 years (subjects 1006/1005) and all of whom have shared IDU habits for 10 years; (iii) a female triad (subjects 1008, 1011, and 1015) with shared IDU habits for over 20 years; and (iv) a familial triad (subjects 1017, 1018, and 1019) with shared IDU habits. Shared IDU habits would include common use of needle-syringes, cookers, cottons, and rinse water. All individuals in these networks were located in Dade County, Florida. Subjects in networks 1 and 2 resided in Overtown (Dade County, Florida) that was a separate locale from those in network 3 who lived in Opa-Locka (Dade County, Florida) (Tables 15.1 and 15.2, Fig. 15.1).

Fig. 15.1
figure 1

(ad) Four socioepidemiological networks. Refer to Tables 15.1 and 15.2 for additional information. IDU = injection drug use with sharing needle-syringes, cottons, cookers, and washwaters. Solid arrow = IDU. Dotted arrow = sex

15.3.2 DNA and Protein Alignments

Nucleotide variations were analyzed based on sequence alignments and are presented in Fig. 15.2a. The transition/transversion bias was also estimated (Table 15.4). The protein translations of the variable regions (V1–V5) showed lesser variation in the V1 loop in comparison with the published global sequence variations of HIV-1 B isolates of blood and brain [75], whereas V2 loop is variable but not hypervariable. However, V3 is hypervariable especially between 182 and 190 and after the starting residues CTRP (Fig. 15.2b). The variable region (V4) is less variable compared to blood env proteins [75].

Fig. 15.2
figure 2

(a) Env coding genes V1–V4 of HIV-1B IDU patients; (b) The translated env proteins show sequence conservations (tall characters), semi-conserved substitutions (stacked with similar colored characters), and variations (stacked with different colors). The annotation is based on the benchmark HXB2 sequence (accession K03455) provided in HIV Sequence Compendium [90]

Table 15.4 Maximum composite likelihood estimate of the pattern of nucleotide substitution

The overall amino acid variation in V1–V4 is lesser in comparison with blood isolates, and the variations are moderate in brain isolates. The differences in the variability of V1–V4 may be due to the accessibility of loops that may reflect mutations to escape from the immune system as well as in vivo variation in biological properties, such as tropism for macrophages or other cell types or ability to form syncytia [76, 77]. The analyzed dataset had a partial V5 start region. Hence, gp41 start, fusion peptide, and immunodominant regions were not available for further comparison.

It was observed that the residues in the start and end of the variable regions (V1–V4) are well conserved (Fig. 15.2b). There were no differences observed in the glycosylation patterns and CD4 regions of the analyzed dataset as in blood- and brain-derived Env sequences of HIV-1, as it recognizes the same CD4 receptor in all strains of HIV. The CD4 located between V3 and V4 is sequentially flanked by well-conserved residues from positions ranging between 254 and 274, except a single residue variation at 263. The most conserved residues throughout the sequences (V1–V4) are A, D, E, F, G, K, L, M, N, P, R, and W.

Intra- and inter-sample variations were compared using protein sequence alignment. The intra-clonal variations of patient (1005) shows a variation at the 93rd position for the clone 1005L2D (instead of “F” it is “S”). This variation occurs only in this clone. The clones (L4D and L5D) of the subject 1008, at position 120, “S” were found instead of “K,” whereas in the following patients 1004 (L1D, L2D, 12L1D, and 12L11D) and 1006 (L5R, L6R, L7R, and L8R), it is “T,” and the rest it is “K.” In the subject (1004) at position 125, “T” is replaced with “Q” in L1D, L2D, 12L4D, and 12L9D, whereas in L4D, L5D, 12L1D, and 12L8D, it is replaced by “R.” At position 155, the subjects 1001, 1002, and 1005 had “K,” whereas the rest (1003, 1004, 1006, 1008, and 1015) it was replaced by “E,” except two clones of 1004 (L4D and L5D), where “E” is replaced with “G.” At position 186, the subject 1004 (all clones) had “Y,” whereas the subjects 1001, 1002, and 1005 had “G,” and in others 1003, 1006, 1008, and 1015, it is replaced by “N.” At positions 200–210, there were some insertions and deletions only for 1004, and for all others it is absent. There are some sequence-specific signatures between 290 and 310; for 1001, 1002, and 1005, it is “FNGTWNNTERSNT”; for 1003 it is “NNNTWNSPNRLNS”; for 1008 it is “STSINANNTEGNE”; for 1004 it is “VTGESNNTVGNG” except for 12L1D and 12L11D (“GTEMSVENDT” and “FTRESNNTVGNGT”); for 1006 it is “VTEGSNNTEGN”; for 1015 it is “WSLNGTNTTNTNE.” These unique subject-specific signatures can drive diversity at the molecular level.

15.3.3 Entropy Analysis

The sequence variations and conservations are analyzed using entropy plots for DNA (Fig. 15.3) and protein (Fig. 15.4). The entropy values >1 were observed at the following positions 23, 64, 70, 77, 200, 203, 336, 570, 601, 607, 608, 617, 721, 722, 744, 856, 869, 895–897, 900, 909, 1055, 1062, 1063, and 1070. The entropy values at positions 23, 200, and 203 were due to gapped alignment instead of variations. Similarly for the protein sequence alignment, the entropy values >1 were observed at following positions: 8 (N), 17 (N), 18 (S), 21 (I), 24 (W), 26 (R), 29 (K), 47 (M), 65, 70 (N), 71 (D), 123 (T), 184 (G), 186 (V), 187 (V), 189 (R), 190 (H), 203 (A), 205 (T), 206 (G), 226 (T), 228 (E), 236 (G), 239 (G), 243 (P), 253 (K), 263 (M), 286 (K), 292 (F), 294 (G), 295 (T), 296 (W), 300 (E), 301 (R), 302 (S), 304 (T), 337 (H), 356 (D), 357 (T), 361, 362 (N), 363 (K), and 364 (T). The entropy values at positions 65 and 361 were due to gapped alignment [63].

Fig. 15.3
figure 3

Entropy plot of Env nucleotide sequence alignment. The plot is generated by comparing residue positions of the first sequence of the input (L6D_1001) with the rest

Fig. 15.4
figure 4

Entropy plot of Env protein sequence alignment. The plot is generated by comparing residue positions of the first sequence of the input (L6D_1001) with the rest

15.3.4 Tajima’s Neutrality Test

The analysis involved 37 nucleotide sequences (m), and the number of segregating sites (S) is 32, the nucleotide diversity (π = 0.097), and the Tajima’s test statistic, D = 0.593. The positive test statistic reflects intermediate-frequency mutations, suggesting diversifying selection [78]. The HIV-1 sequences have not revealed evidence for natural selection in env [79]. Although the test assumes that nucleotides are equally mutable, it is not true for coding regions because the polymorphism is not same for first, second, and third codon positions, and codon usage biases may further complicate the mutation pattern. The difference in the estimate obtained provides an indication of non-neutral evolution. Since Tajima’s test is not very powerful and DNA polymorphisms are largely synonymous, it should be verified further with experimental work [80].

15.3.5 Codon-Based Z-Test of Selection

Only two random pair of clones (L2D 1005 vs L5D 1002) and (L2D 1004 vs L1D 1004) had probability (P)-value less than 0.05, i.e., 0.04 and 0.01, respectively, agreeing the null hypothesis. The rest of the sequence pairs suggest rejecting the null hypothesis of neutrality and preferring to accept the alternate hypothesis.

15.3.6 Estimates of Transition/Transversion Bias

There was a total of 890 transition/transversion positions in 37 nucleotide sequences. The estimated value of the shape parameter for the discrete gamma distribution is 0.6115. Mean evolutionary rates in these categories were 0.04, 0.21, 0.54, 1.16, and 3.05 substitutions per site. The nucleotide frequencies are A = 38.48%, T/U = 24.57%, C = 16.78%, and G = 20.17%. The maximum log likelihood for this computation was −3618.371. The ML-estimated transition/transversion bias (R) is 1.97, and MCL-estimated transition/transversion bias is R = 2.532 (Table 15.4).

15.3.7 Disparity Index Analysis

The substitution pattern between lineages was calculated by assuming that the sequences have evolved with the same evolutionary pattern of nucleotide substitution. The following sequences had disparity index (ID > 0) indicated evolutionary divergence between sequences based on composition bias. Therefore, we can reject the null hypothesis at the 5% level. The sequences of patients 1004 and 1006 (the clones are L1D 1004, L2D 1004, 12L1D 1004, 12L11D 1004, L5R 1006, L6R 1006, L3R 1006, L13R 1006, L4R 1006, L12R 1006, and L1R 1006.) had composition bias (ID = 0.1) in random comparison with patient 1005 (clone: L2D 1005), no bias with others. The sequences of patients 1001, 1002, and 1005 (their corresponding clones L8D 1001, L5D 1002, L3D 1002, L7D 1002, L2D 1002, L1D 1002, L7D 1001, L6D 1001, L4D 1005, L1D 1005, and L2D 1005) showed composition bias (ID = 0.1) with the patient 1005 (clone 12L6D 1015). A different clone of patient 1005 (12L12D 1015) showed composition bias (ID = 0.1) with the following sequences of patients 1001, 1002, and 1005 (their corresponding clones are L8D 1001, L5D 1002, L3D 1002, L7D 1002, L2D 1002, L1D 1002, L7D 1001, L6D 1001, L4D 1005, and L3D 1005). It is to be noted that for a couple of clonal sequences (L1D 1005 and L2D 1005) ID = 0.2, when compared with a patient clone 1015 (12L12D 1015). Thus, the test identified lineages and genes that are evolving with substantially different evolutionary processes as reflected in the atypical patterns of change [70].

15.3.8 Molecular Relatedness and Dendrogram

Dendrograms are suitable to display and infer evolutionary model assuming mutations and speciation events. A consensus tree is displayed after 1000 bootstrap replications (Fig. 15.5). However, it is well known that for some complex evolutionary scenarios involving gene loss, duplication, hybridization, horizontal gene transfer, or recombination, a dendrogram is not suitable for an appropriate representation of evolutionary events. Hence, the incompatible and ambiguous signals in the dataset (such as socioepidemiological data) were represented by split networks that provide only an “implicit” representation of evolutionary history [71]. The estimated proportion of invariant sites of nucleotides is 0.334 and for proteins is 0.427 [81].

Fig. 15.5
figure 5

Evolutionary relationships of HIV-1 B Env based on nucleotide alignment (37 sequences). The branching pattern was generated by the neighbor-joining (NJ) method , and the confidence of the clades was assessed by bootstrap values (n = 1000 replicates)

As depicted in Fig. 15.1 (network 1), the IDU and sexual relationship between 1001 and 1002 agrees with the molecular connections as represented in the dendrogram with 92% confidence on the branch. The network 2 of the same figure is inconsistent with the molecular data, having 1003 and 1004 in two separate clusters with 100% confidence on the branch. Whereas the other subjects (1005 and 1006) were also distributed into two different clusters, 1005 is clustered along with 1001 at a confidence of 92%, and 1006 is entirely a unique cluster with 99% confidence. The network 3 is also in disagreement with the molecular data of 1015 and 1008. However, the sequence details of 1011 are required to confirm with 1015. (No molecular data is available to compare with network 4.)

The resulting nucleotide-based split network is showed in Fig. 15.6, and the protein-based split network is showed in Fig. 15.7. This pattern suggests that the dataset contains conflicting evolutionary signals (such as duration of IDUs, sexual relationship, hypervariable regions, recombination, risk locale, random genetic drift, etc.) and is consistent with the hypothesis of recombination events (refer, vi. Phi test) among the major lineages. Two of clones belonging to IDU patient (1004) appear isolated from the clusters, whereas few others clones of 1004 show sequence admixtures, preferably due to recombination events. Compared to all other subjects, 1004 cluster is more diverse and some clones converge. It is to be noted that split networks provide only an “implicit” picture of an evolutionary relationship and “nodes” in the “split network” do not represent ancestral species.

Fig. 15.6
figure 6

A split decomposition network for 37 Env sequences of IDU HIV-1 B patients based on nucleotide sequence alignment, computed using the neighbor-net method with bootstrap values (n = 1000 replicates). The computed splits are displayed as a network with equal angle. Two of the IDU patient sequences appear isolated from the clusters and are encircled

Fig. 15.7
figure 7

A split decomposition network for 37 Env sequences of IDU HIV-1 B patients based on protein sequence alignment, computed using the neighbor-net method with bootstrap values (n = 1000 replicates). The computed splits are displayed as a network with equal angle. Two of the IDU patient sequences appear isolated from the clusters and are encircled

15.3.9 Phi Test for Recombination

The nucleotide and amino acid characters were analyzed using Phi test [74]. The results are summarized in Table 15.5. Although the split network based on nucleotides and proteins had a similar tree topology (Figs. 15.6 and 15.7), only the network based on nucleotides showed significant evidence for recombination in the Phi test, which was not reflected in proteins.

Table 15.5 Phi test for recombination in the split network

15.4 Discussion

We address the question as to whether clustering observed in IDU networks reported in this chapter may reflect relatively concentrated shared co-injection behaviors. The dyads that we characterized injected with each other feasibly hundreds of times between 1982 and 1988, by which time all had seroconverted. The participants in the double dyad, additionally, had injected with each member of the group many times. There is no evidence of direct interpersonal connections between participants 1005 and 1002 and 1001. It is relevant to note that the carrier of a founding HIV-1B infection need not be personally present when the next victim is exposed. That individual may no longer be on the premises; just his/her paraphernalia need be present for additional individuals to become HIV-1B infected with the strains at hand. These paraphernalia include contaminated needle-syringes, cottons, cookers, and washwaters, all generally used in the IDU venue. IDU paraphernalia have been extensively characterized in Miami-Dade County and shown to be contaminated with HIV-1B [42, 82, 83].

One observed condition in which this kind of exposure takes place is when a group of injectors arrives at a risk locale and uses needle-syringes and paraphernalia contaminated by the people who just finished ‘shooting up’ (injecting drugs). They use the contaminated available paraphernalia and syringes, not necessarily sharing them, but often not adequately cleansing them of prior contaminants. The primary hypothesis to be tested by sequencing ENV variable loops is, if we can expect sequence characteristics of strains of virus will reflect interpersonal risk behaviors characterized by sociocultural studies. In cases where people who run together (socialize) also inject together under private circumstances, we would expect clustering to reflect that fact. In cases where people who inject together do so in risk locales where many inject, we would expect lower levels of clustering of virus strains. Additional study in more localities would aid in understanding the importance of shooting companions and shooting venues in the spread of HIV-1B (as well as additional viruses including HCV and HBV). Moreover, the extent to which individuals participate in out-of-network HIV-1B risk activity needs socioepidemiological comparisons and sequence relatedness characterization as well [42, 47, 48, 49, 84,85,86,87,88,89].

The full V1–V5 ENV domains are included in this study to characterize social effects and molecular changes. The structure and genomes of HIV strains are well known and on a firm basis for such studies [90]. For example, previous studies suggest that the overall tertiary conformation of the entire env protein may be important in tropism-determining activity. In particular, domains within the V1/V2 [91, 92] and/or V3 are critical determinants of macrophage cellular tropism [12, 25, 85, 91, 93,94,95]. A 94-amino acid domain, including the V3 loop, is involved with HIV-1B infection of macrophages being both necessary and sufficient for virus entry [91, 93]. A mutation of residue 287 from a lysine to a glutamic acid converts a non-macrophagetropic isolate of HIV-1B to one capable of replicating in macrophages. In addition, the V3 domain is a 35–37-amino acid loop bounded by a pair of disulfide-bonded cysteine residues [96]. It forms two antiparallel beta turns and a short C-terminal alpha helix [97]. It is an epitope for neutralizing antibodies as well as cytotoxic T lymphocytes [98, 99].

Based on sequence analysis, specific signatures, transition/transversion bias, statistical test of neutrality, and molecular diversity as reflected in dendrograms, we conclude that there are genetic variations in V1–V4 region of HIV-1 B env. For example, disparity index confirmed composition bias in a couple of clones (12L1D_1004 and 12L11D_1004) belonging to 1004 with patient 1005 (clone: L2D 1005). This is very well reflected as isolated branches of split network. Similarly, the sequences of patients 1004 and 1006 had composition bias with patient 1005, and the sequences of patients 1001, 1002, and 1005 showed composition bias with the clone 12L6D of 1015 as reflected in the split network (Figs. 15.6 and 15.7), which confirms non-neutral evolution as indicated by Tajima’s test . The unsystematic variations introduced by recombination may set an evidence for non-neutral evolution. It is also known that the natural selection is frequently masked by recombination and the natural selection over the env V1–V4 region had a minor role in driving diversity [76, 100, 101].

Finally, there is a non-intuitive and unexpected relationship between needle-syringe and paraphernalia sharing and psychiatric morbidities associated with IDU risk behavior. In the current volume of Global Virology II, Thames and Jones (Chap. 12) indicate that IDU needle-syringe and paraphernalia-sharing as well as reduced needle-syringe cleansing behaviors are associated with psychiatric comorbidities due to HIV-1 infection. These comorbidities include antisocial personality disorder (ASPD) and exhibit such as remorseless, impulsive, and irresponsible behaviors. Thames and Jones further report and discuss additional comorbidities associated with such behaviors including DSM-III axis II diagnosis as well as opioid and cocaine consumption. [102]. Additionally, marijuana use is also associated with such set of behaviors and conditions. [103].

15.5 Conclusions

The results described here may lead to new directions of understanding natural selection, random genetic drift, and recombination in the HIV-1B env protein as well as diversity during HIV-1 infections in a defined socioepidemiological context. Additional work is needed to characterize in detail the effects of differing risk activities including contemporaneously in the post-HAART, cART era as the world progresses into the next era of more advanced molecular and immunological therapies. Moreover, the contextual application of studies of IDU risk behaviors and molecular epidemiology should also include characterization of the associated psychiatric morbidities as well as the possible role of brain-related HIV-1 infections.