Keywords

12.1 Introduction

B lymphocytes are key players in the adaptive immune response , and in concert with other cells of the immune system, are capable of directing the recognition of and response against a vast array of pathogens [43]. The critical functions of B cells are made possible by the expression of diverse proteins called immunoglobulins (IG), expressed either as membrane-bound cell surface B cell receptors (BCRs) or secreted antibodies (Abs), each comprised of two identical heavy and two identical light chains. Mirroring the diversity observed among the potential pathogens and related epitopes that could be encountered during a lifetime, the amino acid diversity of the Abs produced by an individual (i.e., the Ab repertoire) is astounding, with theoretical estimates exceeding 1015 unique Abs in a given repertoire [54]. The diversity observed in the naïve Ab repertoire is the consequence of unique molecular processes, beginning with combinatorial diversity that arises from the somatic “V(D)J” recombination of hundreds of germline heavy and light chain variable (V), diversity (D; heavy chain only), and joining (J) gene segments across three primary loci in the human genome, accompanied by additional junctional variation introduced via the addition of nucleotides at the junctions of V, D, and J gene segments, and the random pairing of heavy and light chains [43, 51, 63]. Additional diversity is later added to the repertoire following antigen stimulation via somatic hypermutation (SHM).

Effectively and comprehensively capturing the diversity of the Ab repertoire has historically been a challenging endeavor. However, the recent application of high-throughput sequencing (HTS) to Ab repertoire profiling has begun to provide great insight into the dynamic nature of the Ab response [20]. With these methods, generally referred to as “RepSeq”, repertoire-wide profiling of Ab variable domains (and in some cases including constant domain isotype-level information) can be conducted in depth at either the level of rearranged genomic DNA or mRNA, and can be done across millions of cells in bulk, or at the single cell level [67]. In addition, accompanying these technologies has come a surge of computational methods for analyzing such HTS data [79], allowing for complex modeling of the Ab repertoire and in depth descriptions of a range of repertoire features (e.g., IG gene segment usage; complementarity determining region (CDR) length and diversity; SHM patterns; clonal expansion). Over the past several years, these approaches have been applied in many disease and clinical contexts and across cell subsets and time points, clearly establishing that Ab repertoires can be influenced by many factors, including age, infection history, disease status, genetic background, and treatment /prophylactics [18, 21, 23, 39, 47, 50, 61, 64, 66, 77, 81]. In summary, HTS-based analyses of immune repertoires have enormous promise for understanding repertoire dynamics in immunology, vaccinology, infectious diseases , autoimmunity, and cancer biology.

In this review, we present the results of the use of HTS of antibody repertoires in the development of the adaptive immune response to HIV-1, to illustrate the importance and utility of this method. We summarize some general conclusions from such studies, but also some significant limitations, especially related to the analysis of only a limited section of the complete Ab repertoire, typically in few individuals. Finally, we review new trends in adaptive immune repertoire analysis that may provide opportunity for overcoming these limitations.

12.2 Characterizing the Development of Ab Repertoires During HIV-1 Infection

12.2.1 General Questions and Importance of HTS Repertoire Studies

Studies of the development of Ab repertoires in health and disease have been dominated by those targeting HIV-1 Abs. This is due to several circumstances, including disproportionate funding rates and the obvious immediate application of such studies to vaccine development. These studies of Ab repertoires in HIV-1 infection provide an excellent example of the power of using HTS for studying Ab repertoires, and we therefore have chosen to focus on this set of studies in our review.

At least two related questions have substantially motivated the studies of Ab repertoires during HIV-1 infection. The first question concentrates on the characteristics of anti-HIV-1 broadly neutralizing Abs (bnAbs), which an effective HIV-1 vaccine would presumably have to elicit in naïve individuals. Much has been made of the observation that the sequences of most anti-HIV-1 bnAbs are highly somatically mutated, encode long (≥20 aa) heavy chain CDR3 (CDR-H3) sequences, and are poly- or auto-reactive ([19, 40]; but see [10]). This observation led some researchers to suggest that one explanation for the rarity of bnAbs against HIV-1 is that the individual must “break tolerance” to produce these bnAbs [24]. However, all bnAbs against HIV-1 don’t have these properties, and to what extent these features are required for the acquisition of neutralization breadth and potency of bnAbs remains poorly understood. The application of HTS to probe the Ab repertoire opened the possibility of describing the characteristics of bnAbs against HIV-1 in fine detail, and this remains one of the strongest motivations for these studies.

The second question motivating HTS Ab repertoire studies in HIV-1 focuses on the timing or ontogeny of the development of the broadly-neutralizing response during infection. It is well established that most HIV-1-infected individuals produce strong strain-specific bnAb responses against HIV-1 envelope (Env) soon after initial infection; yet only a small percentage of infected individuals develop broad neutralization, and then only after a year or more [22, 57, 58]. The “B-cell-lineage immunogen design” strategy for anti-HIV-1 vaccine development attempts to understand the timing and action of immunogens that led to the development of anti-HIV-1 bnAbs, and then induce such bnAbs in naïve individuals [5, 25]. This “directed evolution” approach to rational immunogen design relies on identifying clonal lineages that have led to bnAbs in order to infer their common unmutated ancestors (putative naïve B cell precursor) and intermediate sequences within the Ab clonal lineages. For a given Ab lineage, once the Ab sequences of the unmutated ancestor and intermediates along the developmental lineage have been identified, it is then possible to identify epitopes to which these Abs bind. Finally, this points to possible immunogens that might be used to drive the Ab response along specific, predetermined “evolutionary” pathways. The goal is to recapitulate these lineages in naïve individuals, and increase the chance that these naïve individuals will ultimately produce bnAbs themselves. This approach has dominated HIV-1 vaccine research for several years, and explains why many of the studies concentrate on only specific lineages, targeted to known bnAbs within the broad Ab response (“targeted” lineages). This is in contrast to studies that would attempt to capture the complete Ab milieu during infection, i.e., all of the dominant Ab lineages present in the repertoire. In this review we will discuss how this emphasis on studying particular targeted lineages has shaped HIV-1 Ab repertoire research, and what advantages a broader approach examining multiple lineages might provide.

12.2.2 Survey of Studies of the Ab Response During HIV-1 Infection Using HTS

In this section, we review those studies that use HTS to characterize the anti-HIV-1 Ab response. We include only those studies that report a response to HIV-1 infection, rather than that induced by vaccination. Most of the bnAbs known to target HIV-1 bind to a small set of epitopes, and we organize the studies according to these regions (summarized in Table 12.1).

Table 12.1 Survey of studies of the Ab response during HIV-1infection using HTS, organized primarily by epitope

12.2.2.1 CD4 Binding Site (CD4bs)

The largest set of studies using HTS to explore antibody production and development in response to HIV-1 infection are those that target the CD4 binding site (CD4bs); one of the earliest studies of this type was by Wu et al. [76]. In this study, PBMC from 3 donors from the international AIDS-vaccine initiative (IAVI) protocol G were isolated and subjected to HTS. This set included Donor 45, from whom bnAbs VRC01, VRC02 and VRC03 had been originally isolated. These sequences, especially VRC01, were subjects of intense interest, given their ability to neutralize up to 90% of HIV-1 isolates [75].

This seminal study used several filters to exclude all Abs except those related to the neutralizing mAbs VRC01, VRC02, VRC03 and VRC-PG04. First, only antibodies utilizing the IGHV1 gene family were amplified and sequenced, and only those using the IGHV1-2 gene were further analyzed. A second filter was applied that only accepted sequences that formed a cluster of sequences defined by divergence from the germline IGHV1-2 gene in one dimension and divergence from the template antibodies VRC01 and VRC03 in the second dimension. This is a prime example of the approach used to restrict analysis only to clusters of antibodies that are related to a “target” antibody, in contrast to analysis techniques that attempt to study the entire Ab repertoire. Finally, the authors used cross-donor phylogenetic analysis, a general technique used in several similar studies, to further characterize the Ab repertoire. In this instance, similar antibodies from 2 other donors were added to the filtered antibodies from the primary donor, Donor 45, and a tree was constructed from these Ab sequences. A node on the tree was identified that included all VRC01 sequences from these 3 donors, and then all Ab sequences nested within that subtree emanating from that node were used in the final analysis. From studying this final set of Abs, the authors concluded that VRC01-like antibodies from all three donors exhibited similar developmental pathways, in terms of rates of diversification and other parameters. Their functional relatedness was confirmed by the observation that complementation of these identified heavy chains with standard VRC01 light chains from different donors produced neutralization.

Liao et al. [36] also examined anti-CD4bs Abs, and produced a major advance by sequencing both the Ab response to HIV-1, as well as the HIV-1 clonal lineages that were evolving in response to the Ab repertoire. They concentrated their study on one donor, CH505, the source of the VRC01-class bnAb CH103 (“VRC01-class” refers to Abs with similar properties to the canonical VRC01). By sequencing both Ab and HIV-1 lineages, Liao and colleagues were able to infer the co-evolution of the virus with the Abs, as the virus mutated to escape control by the Abs, and the Abs mutated to bind to the changing viral strains. The reconstruction of the CH103 lineage made it possible to infer the sequence of its unmutated common ancestor, and then it was shown that this ancestral sequence was able to bind transmitted/founder HIV-1 envelope glycoprotein. Furthermore, the Ab response of lineage CH103 broadened through time, following viral diversification of the CH103 binding site. This reconstruction of the evolution of the Ab lineage leading to the production of a bnAb is critical to the B-cell-lineage immunogen design strategy of vaccine development, as described in Sect. 12.2.1.

Zhou et al. [82] applied the HTS approach to multiple donors, again only targeting lineages based on VRC01-class bnAbs. They found that VRC01-class antibodies were obtained in multiple donors, characterized by similar maturation pathways and structural solutions to binding. This supported the hypothesis that the development of bnAbs such as the VRC01 class could be elicited in a large proportion of naïve individuals through the use of the proper immunogens (i.e., the B-cell-lineage immunogen design strategy).

Based on several studies, VRC01-class bnAbs have been characterized by the use of the IGHV1–2 gene for the heavy chain and a light chain with a 5-amino acid third complementary-determining region (CDR-L3). Several other recent studies have used HTS to help identify and describe the development of this class of antibodies in other donors. For example, Zhu et al. [85] discovered several Abs in this class from donor C38. Although many of the Abs from this study did not exhibit high sequence homology to other VRC01 class Abs, when reconstituted with VRC01 light chains, most of these new VRC01-class Abs showed substantial neutralization of HIV-1 isolates. Huang et al. [30] also found VRC01 class antibodies in a previously unstudied individual, Z258, from the NIAID protocol. Strikingly, the N6 Ab from this donor, in particular, neutralized 98% of HIV-1 isolates, including several strains that were not neutralized by previously discovered VRC01-class Abs

As a final example, Gao et al. [19] used HTS to discover that the donor who produced the bnAb CH103 [36] also produced a second bnAb, CH235. HTS of HIV-1 isolates and of the Ab repertoire from multiple time points during this individual’s infection revealed the coevolution of the virus and the Ab response in fine detail. The authors hypothesized that CH235 developed first, and that an escape mutation on HIV-1 in response to CH235 facilitated the development of the broadly neutralizing Ab CH103; thus, it was hypothesized that cooperation of the two Ab lineages led to broad neutralization in this donor.

This large set of studies examining broadly neutralizing anti-CD4bs Abs has led to several recent conclusions that may be important for guiding HIV-1 vaccine design. Zhou et al. [83] summarized these studies by concluding that bnAbs against CD4bs could be characterized by either: (1) the use of particular IGHV genes, usually IGHV1-2 or IGHV1-46, or (2) dominated by particular CDR-H3 sequences. Furthermore, Wu et al. [77] concluded that the development of multiple Ab lineages initially with high mutation rates was associated with the production of bnAbs against HIV-1, but that this mutation rate tended to slow during the development of the expressed anti-HIV-1 repertoire.

12.2.2.2 Variable regions V1 and V2 (V1V2)

HTS studies of the Ab repertoire have also been critical to our understanding of the development of bnAbs that bind to variable regions 1 and 2 (V1V2) of the HIV-1 envelope. One of the most comprehensive analyses of this sort is based on a set of 12 bnAbs that were isolated from donor CAP256 (enrolled with the Centre for the AIDS Programme of Research in South Africa (CAPRISA)), who was infected with a clade C virus; these 12 Abs are referred to as (CAP256-VRC26.01–12), where VRC26 refers to the Ab lineage [13]. This lineage is extraordinary in that it exhibits a 35–37 amino acid long CDR-H3. HTS sequencing of the Ab repertoire from this donor was conducted at eight time points between 15 and 206 weeks post-infection. Targeted lineage analysis of HTS from this donor, using these 12 bnAbs as targets, defined a broad Ab lineage whose unmutated ancestor was able to neutralize the virus that superinfected this individual 15 weeks after initial infection [13]. This result is similar to the inferred history of the lineage leading to the CH103 anti-CD4bs bnAbs, in that the inferred unmutated ancestor bound the HIV-1 target [36]. Inference of the unmutated common ancestor heavy chain sequence supported the hypothesis that this long CDR-H3 region was present in the original B-cell whose descendants ultimately produced these VRC26 bnAbs.

Doria-Rose et al. [14] conducted further studies of the Ab response from this same individual, donor 256. They used microneutralization and single-cell sorting to isolate 21 more bnAbs from this donor, all related to the original VRC26 lineage, referred to as VRC26.13–33. All available HTS data from the Ab repertoire from this individual were used to place the complete set of 33 VRC26 Abs within one phylogenetic lineage (in this case the targeted lineage analysis was based on CDR-H3 sequences, not full Ab sequences). The lineage defined by this targeted approach bifurcated early into two distinct lineages, one of which died out and one of which developed bnAbs. HTS of the viral component in this individual [4] showed that viral escape created multiple immunotypes, some of which were able to tolerate variability at key epitope contacts and thus contribute to this neutralization breadth. This mechanism of developing breadth within one Ab lineage suggests that viral diversification may be commonly associated with broadening of neutralization, and contrasts with the multi-lineage cooperative pathway hypothesized for the development of breadth in some anti-CD4Abs (e.g., [19]).

12.2.2.3 Membrane-Proximal External Region (MPER)

HTS studies of the Ab response against the MPER have concentrated on donor N152, who was the source of one of the most broadly neutralizing anti-HIV-1bnAbs so far discovered, 108 [29]. The heavy chain of 108 shows a high rate of divergence (21%) from its inferred germline gene. Zhu et al. [85] sequenced heavy and light chains of related Abs separately, and then used a targeted analysis focused on 108, to obtain separate pools of heavy and light chain sequences only related to 108. Phylogenetic trees were then inferred separately from heavy and light chain sequences. The topologies of these phylogenetic trees were similar, allowing for heavy and light chain sequences to be paired based on these topologies. Abs identified by this topology-based pairing approach had lower auto-reactivity than was exhibited by pairs of sequences drawn without respect to the tree topology. Further sequencing from one time point of the N152 donor produced more Abs related to the 108 lineage and allowed for the inference of the unmutated common ancestor, which bound only weakly to the original MPER target [60]. Much of the motivation for the study of 108 stemmed from the fact that the inferred intermediate sequences in the reconstructed lineage might identify potential immunogens for the induction of bnAb 108, following the logic of the B-cell-lineage immunogen design strategy.

12.2.2.4 N332 Glycan Supersite in V3 Loop of GP120

Several strongly neutralizing Abs have also been described that target the high mannose patch centered on glycan N332 in the V3 loop of GP120. This site is an important vaccine target, given that passive administration of Abs binding to this region have been shown to prevent infection [42] and significantly decrease the strength of an ongoing infection in non-human primates [2]. HTS was used to analyze the Ab response in donor 17 of IAVI Protocol G [59], from whom the PGT121-class bnAbs that target this region had been isolated. Within this class of Abs, they found a positive correlation between the level of SHM and the development of neutralization breadth and potency. Putative intermediates within this lineage were characterized that showed only approximately half the mutation level of PGT121-134, but were still capable of neutralizing roughly 40–80% of PGT121-134 sensitive viral isolates. Such intermediates characterized by lower SHM are attractive vaccine targets, because they may be more easily elicited by the B-cell-lineage immunogen design approach than highly mutated Abs.

He and colleagues [26] further studied the Ab repertoire from donor 17 and compared this to repertoires from two individuals that were not infected with HIV-1. The Ab repertoires were surveyed in two ways: amplified with gene specific heavy and light chain primers, matching the germline components of PG121-class Abs, and by 5′ RACE. Since 5′ RACE is not based on family- or gene-specific primers, it should produce an estimate of the complete Ab repertoire, in an unbiased manner. Much of the analysis in this study employed intra-donor phylogenetic analysis, targeted on PGT121 Abs. Results from this analysis greatly expanded the known number of Abs in this family, based on Ion-torrent PGM sequencing technology compared to the standard 454 platform. The Ab repertoire results using 5′ RACE and the bioinformatic analysis not targeted to a specific Ab family are included in the next section, which summarizes HTS studies of complete Ab repertoires during HIV-1 infection.

HTS was also used to sequence the Ab repertoire from another donor, 39, of the IAVI protocol G [84]. This individual was the source of the bnAbs PGT135-137, which also target the N332 supersite. In this study, a lineage of Abs related to PGT135-137 was identified by targeted phylogenetic analysis. This apparent lineage exhibited 15 distinct clusters of heavy chain sequences and 10 clusters of light chain sequences; sequences chosen to represent these somatic populations showed diverse neutralization characteristics.

Finally, the most recent study to use HTS data to study Ab responses against this region identified 12 somatically related bnAbs from donor PC76, 16–38 months post-infection, referred to as the PCDN lineage [38]. These Abs were not as highly mutated as previous bnAbs raised against this site, and did not include insertion–deletion variants. Again, the closer a bnAb is to its germline configuration, the better it is as a vaccine target according to the B-cell-lineage immunogen design approach, given that these Abs would be easier to elicit using a series of immunogens compared to more highly mutated Abs, which may not always follow predictable maturation pathways. Macleod et al. [38] also showed that multiple pathways led to neutralization breadth, and the authors hypothesized that early diversification followed by maturation of parallel lineages is a requirement for obtaining neutralization breadth.

12.2.2.5 Non-targeted Analysis of Ab Repertoire During HIV-1 Infection

A few studies have used HTS to examine the complete Ab repertoire during HIV-1 infection, referred to here as non-targeted approaches. These studies contrast the targeted studies in two ways: amplification of all IGHV genes (either by using primers amplifying all V genes, or 5’RACE), and the use of bioinformatic tools that examine all Abs and lineages, rather than only sequences related to a target Ab.

For example, Xiao et al. [78] used HTS to explore the Ab response in control individuals and an HIV-positive individual. They reported on V(D)J recombination frequencies, but did not perform lineage analyses. The primary findings were that the dominant IGHV family used in HIV-1+ compared to control individuals varied between IgM and IgG libraries, and from early to late infection.

The results of He et al. [26] were mentioned previously, in terms of Ab repertoires that focused on the PG121-class of Abs. They also reported on non-targeted Ab repertoires from an HIV-1+ and two control individuals. By examining the complete Ab repertoire, they observed that the dominant IGHV gene used in the repertoire from the infected individual was not the same IGHV gene associated with the PG121-class Abs. Also, one of the unbiased repertoires sequenced from a non-HIV-1 infected individual showed a strongly skewed repertoire in terms of IGHV gene usage. It is clear from this study of only a few individuals that unbiased repertoires can be highly different between individuals whether they are infected or not, consistent with what has been observed in other studies of healthy individuals (e.g., [18, 9]).

Yin et al. [81] analyzed complete Ab repertoires, comparing control individuals (n = 4), systemic lupus erythematosus (SLE) patients (n = 4), and HIV-positive infected individuals either undergoing treatment (n = 4) or not (n = 4). They only analyzed the CDR-H3 region of the Ab from the IgM compartment. They observed that gene usage, CDR-H3 length distribution, and SHM did not differ between these groups. Specifically, anti-retroviral therapy (ART) did not normalize the diversity of the IgM Ab repertoire.

Hoehn et al. [28] used HTS to probe the Ab repertoire in eight HIV-positive individuals, five of whom were being treated with ART and three who were not. These repertoires were compared to Ab repertoires of six healthy patients from a previous study. These researchers applied several measures of clonality to the entire Ab repertoire, and in general observed that the HIV-1+ individuals exhibited higher clonality. There was significant variability among HIV-positive patients treated with ART in terms of the clonality of their repertoires, such that no differences were observed between treated and untreated patients. This is similar to the conclusion of Yin et al. [81] , except with larger sample sizes in two of the group subsets. Across all HIV-1+ patients, there was no association between repertoire clonality and clinical variables, such as viral load or CD4+ T cell count.

12.2.2.6 Summary of Results from HTS Studies of the Anti-HIV-1 Ab Repertoire

Many of the HTS studies of Ab repertoires during infection have focused on HIV-1 infection. As mentioned earlier, these have most often targeted particular bnAbs and their relatives within the Ab repertoire. The perfect vaccine candidate Ab would be able to neutralize a broad set of HIV-1 isolates, use common genes, not require an unusually long CDR-H3, and would be common in the naïve repertoire of most individuals. This bias toward HIV-1 studies of particular bnAbs reflects the desire to learn how to elicit such bnAbs in naïve individuals.

In our review of these studies, we have gathered a diverse set of conclusions regarding the conditions eliciting these bnAbs that could be vaccine candidates. These conditions include high rates of SHM [59], strong viral diversification [4, 36], unmutated common ancestors binding to the infecting HIV-1 strain [13, 36], multiple parallel Ab lineages [38], presence of competing Ab lineages [19], and early rates of viral diversification [36]. Despite these numerous correlations, there doesn’t seem to be any conditions that are consistently associated with the development of anti-HIV-1 bnAbs across multiple studies. Several factors could plausibly explain such inconsistencies.

For one, these studies are characterized by large datasets, often in the 10s of millions of sequences, but sequenced from only a few individuals. In fact, several donors (e.g., N152, CAP256, Donors 17 and 45 of IAVI Protocol G, and CH103; Table 12.1) have repeatedly been sampled across studies, because they are known to produce some of the most broadly neutralizing Abs. This focus on a small number of donors makes sense, given the importance of the bnAbs that they produce, but this limits the generality of the results that we can draw from the set of studies undertaken so far. Also, as noted several times, the above patterns were observed mostly in studies where bioinformatic filters had been used to restrict the analysis to only Abs that are related to targeted Abs, usually the bnAbs that these donors are known to produce. This again restricts the scope of any conclusions that can be drawn from these studies. Finally, the bioinformatic approaches used to capture related Abs, such as intra- and inter-donor phylogenetic analysis, only capture sets of Abs that are functionally related (e.g., utilization of the same IGHV and J gene, with similar CDR-H3), but there is no way to estimate how likely it is that these Abs may be related as ancestor/descendants which is the usual definition of a clonal lineage. In such circumstances, for example, conclusions on the necessity of binding patterns of these unmutated common ancestors or early bifurcation of B-cell lineages may not make sense if the B-cell lineages referred to are in fact multiple biological lineages. In the next section we discuss the utility of non-targeted approaches, and what is needed to make these larger scale comparisons possible (i.e., sharing of large data sets across studies and institutions).

12.3 Future Directions to Enhance the Value of Ab Repertoire Studies

12.3.1 Germline Genes and Haplotypes

The Ab repertoire community is now placing a greater focus on understanding the origins of key inter-individual similarities and differences observed among repertoires responding to a given pathogen. The identification of common repertoire signatures in larger subsets of individuals is likely to improve strategies for the design of effective therapeutics and prophylactics. Given the importance of directing the development of the repertoires of naive individuals under the B-cell-lineage immunogen design strategy, some have questioned whether all individuals have the capacity to actually respond to a given immunogen in the same way [5]. It is certainly evident in many contexts that not all vaccines elicit equivalent responses in all individuals of the population [8]. While in most scenarios, many factors are at play (e.g., an individual’s infection/vaccine history and age), there is a growing appreciation for the potential influence of heritable factors on the development of both the baseline naïve and antigen-stimulated Ab repertoires [18, 21, 50, 66]. This coincides with a growing body of work describing extreme levels of genetic diversity at the IG loci among human populations, including single nucleotide variants in both the coding and non-coding portions of IG V, D, J, and C genes, as well as large structural variants that can influence the number of functional IG gene segments present in a given genome (e.g., [11, 32, 35, 41, 53, 65, 69]). Importantly, there are now many studies that have directly linked IG germline polymorphisms to differences in Ab repertoire signatures and gene usage, Ab function, and disease and clinical outcomes (e.g., [1, 15, 33, 45, 46, 52, 56, 62, 73]), together suggesting that any two individuals may not necessarily be predisposed to mount identical responses. With these examples in mind, moving forward we believe the inclusion of data on germline V, D, J and C gene variants in the study of variation in the Ab repertoire in health and disease (ideally in larger cohorts) will be critically important (see [68, 70]).

A potential role for germline variation in the development of CD4-directed neutralizing Abs has been specifically observed among repertoires in HIV+ individuals. Yacoob et al. [80] recently showed that particular alleles (namely, *02, *03, and *04) at the human IGHV1-2 gene that encode three key amino acids serve as better germline precursors for VRC01 bnAbs, with better binding affinities, noting that this is consistent with the observation that all VRC01-class bnAbs identified to date are encoded by the germline IGHV1-2 allele *02 [72]. A critical observation was that only 8/9 subjects examined in their study carried *02 alleles, suggesting that some individuals in the population may have a reduced capacity to produce effective VRC01-class bnAbs [80]; this may be specifically relevant to germline targeting-based vaccine priming approaches [31]. Allelic restrictions have also been noted for IGHV1-69-encoded bnAbs against gp41 elicited by vaccination [74], asl well as for critical binding residues of influenza hemagglutinin stem-directed bnAbs [1, 45, 73].

12.3.2 Integration of Heavy and Light Chain Data, and B-Cell and T-Cell Repertoires

Many of the studies surveyed report data only on immunoglobulin heavy chain sequences, and sometimes only for the CDR-H3 region. However, both the heavy and light chain can contribute to Ab binding properties, and including both the heavy and light chain is necessary to most accurately determine the clonal lineages within an Ab or T-cell receptor (TCR) repertoire. In fact, several labs have developed the ability to report both the heavy and the light chain sequences for B cells and T cells. The earliest approaches involved pairing heavy and light chain sequences based on the topologies and frequencies of corresponding branches of phylogenies constructed separately from heavy and light chain sequences (e.g., [85]). More recent approaches are based on emulsion techniques (e.g., [12]) or other sorting techniques that isolate single cells and capture paired heavy and light chain sequences from each B cell [44, 49]. Parallel analyses of Ab/B-cell and T-cell repertoires are starting to reveal common and divergent patterns between the two compartments [3, 47], and isolation of mRNA from single cells allows relating complete-length paired heavy and light chain Ab sequences to T-cell receptor repertoires in fine detail [7].

12.3.3 Integration of HTS B-Cell Repertoire Data with Other Types of Genomic-Level Data

One of the most important future developments for HTS studies of Ab repertoires will be to integrate these data with other large scale, genomic datasets. For example, Georgiou and colleagues [34] have combined HTS sequencing of the Ab repertoire with high-resolution protein mass spectrometry of the serum antibodies (Ig-seq) in order to connect soluble Ig in blood and secretions with clonally expanded peripheral B cells [34]. This can lead to identifying Ag-specific lineages both functionally and developmentally. In addition, interesting insights have come from combining HTS of gut microbiota with Ab data [16, 37].

12.3.4 Importance of Non-targeted, Complete Ab Repertoire Studies

This review shows that most of the studies of Ab repertoires in HIV-1 infected individuals have taken a “targeted” approach, in that they only examine sequences that are related to particular target bNAbs. We contrast this to non-targeted, complete repertoire studies that attempt to estimate the number and relative expansion of all of the Ab lineages in an individual’s repertoire. Of course, it is practically impossible to completely characterize an individual’s Ab repertoire, given that it has been estimated that an individual can have as many as 1011–1012 lymphocytes [55], many of which express unique Ab/B-cell or T-cell receptors.

The targeted approach can answer some specific questions very well, such as describing the development of a specific lineage of Abs that led to the development of a specific bnAb. However, many questions about the development of Ab repertoires, including understanding the conditions and broader repertoire context leading to bnAbs, require a description of the complete Ab “milieu” within which a particular Ab lineage was formed (i.e., the number and estimated size of each lineage). This question comes down to: what additional insights do we obtain from unbiased, non-targeted sequencing coupled with attempts to describe the complete Ab repertoire?

One example of a question that demands a complete repertoire analysis, rather than one targeted to specific pre-defined lineages, would be how often are certain lineages produced in response to HIV-1, such as those related to the important bnAb, VRC01? In one such study, Zhu et al. [85] reported 15 clusters of sequences all related to PGT135-137, and these clusters exhibited highly variable levels of neutralization. From this observation, the authors concluded that the VRC01 lineages could be produced by a large proportion of individuals. However, without comparing the frequency of this type of lineage within a broader representation of related Ab lineages, it is difficult to estimate the naturally occurring frequency of this lineage among individuals.

Also, several studies have investigated the relationship between the expansion of clones early in HIV-1 infection and the production of bnAbs. Again, one would have to have a complete description of the clonal structure of each individual to determine which individuals are characterized by clonal expansion, even of related clones, and when those clones expanded. Any comparison of V, D, or J gene usage, or the uneveness of repertoires in individuals who do or do not produce bnAbs, would also only be possible based on an unbiased view of the repertoire. Given that these signals might discriminate individuals producing or not producing bnAbs, it is obvious that we must strive to describe the entire Ab repertoire as often as possible.

Furthermore, it is worth noting that the necessity of understanding the complete Ab repertoire is not just restricted to studies of the development of anti-HIV-1 bnAbs. It is likely that the Ab repertoire will be completely different between individuals depending on many different conditions, from infection to autoimmune disease to cancer, and understanding the effects of these conditions will demand describing the complete Ab responses in many individuals. “Public” Abs, those produced by many, including presumably healthy, individuals, and using commonly expressed V, D, and J genes, would be obvious targets for most vaccine attempts. Targeted approaches to Ab repertoires demand large sequence datasets, but attempts to describe the complete Ab repertoire will demand even larger datasets, and answering the above questions will likely involve comparing and sharing these huge data sets across studies, disease states and institutions. The next section addresses some of the challenges in reporting these data to improve reproducibility and in sharing these data in order to understand what drives similarities and differences in Ab repertoires.

12.3.5 Standardizing Data Generation, Analysis and Sharing Protocols

HTS of Ab/B-cell and T-cell receptor repertoires has increased dramatically since the technique was first applied to immune receptor repertoires in 2009 [9, 17, 48, 71]. This experimental approach allows us to explore the development of the adaptive immune system in exquisite detail, and holds significant translational promise as diagnostic and prognostic tools. However, as discussed in this review, there are several limitations to the present generation and analysis of these type of data. Efforts to standardize protocols and facilitate the sharing of data could help to answer many of these limitations.

A primary issue that arises when trying to draw conclusions based on analyses across large, complex and shared data sets is reproducibility. This largely concerns the development of standards centered around the reporting of key details related to the production and analysis of data that ensure another researcher can reproduce the data and recapitulate the analysis, and ultimately be confident that the data are of sufficient quality to be shared and compared.

Facilitating comparing and sharing of these data sets within and across labs, disease states and institutions also demands new protocols for data deposition and sharing in public repositories, as well as bioinformatic solutions to analyze these complex data. In some cases, these protocols will also need to take into account the protection of intellectual property (IP), data security, and donor confidentiality. The Adaptive Immune Receptor Repertoire (AIRR) Community is a group of immunologists, immunogeneticists, clinicians, bioinformaticians, and experts in legal, IP and security aspects of genomic data sharing, formed in 2015 to address these issues for this rapidly expanding area of immunological research. The progress and recommendations of the working groups of the AIRR Community are summarized at airr-community.org; anyone interested in joining the Community can also do so at that site.