Introduction

One of the most important products on the therapeutic recombinant protein market is the monoclonal antibody. More than 54 therapeutic antibodies have been approved for various indications, including cancer and autoimmune diseases.1 Traditionally, therapeutic antibodies have been generated by mouse B-cell hybridoma technology followed by chimerization or humanization.2 In the past few decades, technologies such as transgenic mice encompassing human antibody gene repertoires, and phage display of antibody libraries, have become available, facilitating the rapid flourishing of therapeutic antibodies in the drug discovery field.3

Phage display technology frequently allows the creation of libraries containing up to 1011 different variants, which can be used to screen antibody clones by bio-panning.4 Despite the development of alternative display technologies such as bacterial display, yeast display and ribosome display, phage display remains the most widely used display technology due to the robustness of the filamentous bacteriophage M13.5 Several therapeutic antibodies that are currently either approved or in clinical trials have been developed by phage display technology.6, 7

Recently, next-generation sequencing (NGS) technology has allowed a massive increase in capacity to sequence genomes at relatively low cost and in a short time frame.8 It has revolutionized multiple aspects of biological research5 and is also actively being adopted into antibody phage display technology. Several NGS platforms are currently available, with average read lengths of 75–8500 bp and different error rates.9 The CDR3 sequence of the VH and VL genes has been effectively determined by the MiSeq system;10 a single-domain antibody gene was successfully determined by the MiSeq system using a 2 × 250 paired-end module;11 and the entire VH gene was successfully sequenced using the 454 pyrosequencing system.12 However, sequencing of the entire single-chain variable fragment (scFv) gene, which contains 750–800 bases, could not be achieved using any of these NGS platforms, to the extent of the authors’ knowledge. In one study, to obtain the whole scFv gene sequence, HCDR3 sequences were first determined by the MiSeq system; the entire scFv gene was then generated by two-step linker PCR using primers based on the heavy chain complementarity-determining region 3 (HCDR3) sequences, and its sequence was determined by Sanger sequencing analysis.6 In another similar study, HCDR3/FR4 sequences were determined from Ion Torrent PGM sequence analysis using the 318 chip. Then, the entire scFv gene was retrieved by inverse PCR using primers based on the HCDR3/FR4 sequences.13

Following NGS analysis, the antibody gene is typically cloned and expressed. And the binding reactivity of the antibody to its target as well as its biological activity are tested. However, this may prove to be unproductive when the fraction of positive clones is not high following bio-panning. It has been extensively reported that positive clones tend to be enriched through bio-panning and negative clones show the opposite tendency. Therefore, NGS analysis of clones after each round of bio-panning could provide insights on which clones are more likely to be positive. Furthermore, it is unknown whether there is a difference between scFv clones identified by conventional colony screening methods14 and those obtained from NGS.

In this study, we have attempted to answer these questions. We performed four rounds of bio-panning using three scFv libraries constructed from prostate-specific antigen (PSA)-immunized chickens. We then performed NGS analysis of scFv clones focusing on HCDR3 in the initial scFv library and in four enriched scFv libraries obtained from subsequent rounds of bio-panning. scFv clones were obtained after the last round of bio-panning using the conventional colony screening method from the output titer plate, or from phagemid DNA prepared following a previously reported procedure.6 The reactivity of these scFv clones was measured using a phage enzyme immunoassay. Based on these experiments, the sequences obtained using NGS and the conventional colony screening method were compared. We also classified scFv clones obtained from NGS into 3–4 clusters based on their enrichment or impoverishment patterns, and analyzed these patterns for clues regarding the reactivity of scFv clones.

Materials and methods

Library construction and bio-panning

Three white leghorn chickens were immunized and boosted four times with recombinant human PSA (Fitzgerald, Acton, MA, USA). After the final booster injection, total RNA was extracted from the spleen, bone marrow, and bursa of Fabricius using the TRI Reagent (Invitrogen, Grand Island, NY, USA). First-strand cDNA was synthesized using SuperScript reverse transcriptase with oligo (dT) priming (Invitrogen). Using this cDNA, three phage-displayed libraries of chicken scFvs were constructed using the pComb3XSS phagemid vector, as described previously.14 Four rounds of bio-panning were performed to screen scFv clones from the library following a previously reported procedure.15 For each round of bio-panning, 5 × 106 magnetic beads (Dynabeads M-270 epoxy) (Invitrogen) coated with 1.5 μg recombinant PSA protein were used.

Phage enzyme immunoassay

The scFv-displaying phages were rescued from titer plates after transformation and subjected to phage enzyme immunoassay as described previously.14 The microtiter plates (Corning, NY, USA) were coated overnight at 4 °C with 20 μl recombinant human Fc-tagged PSA (5 μl ml−1) dissolved in phosphate-buffered saline (PBS). After blocking with 3% bovine serum albumin dissolved in PBS (w/v, PBS-B), the plates were then sequentially incubated with scFv-displaying phages in the culture supernatant, horseradish peroxidase (HRP)-conjugated mouse anti-M13 monoclonal antibody (GE Healthcare, Pittsburg, PA, USA) in PBS-B, and then finally with 2,2′-Azinobis [3-ethylbenzothiazoline-6-sulfonic acid]-diammonium salt (ABTS) substrate solutions (Amresco LLC, Solon, OH, USA), with intermittent washing using 0.05% Tween-20 in PBS (PBST). After incubating the plates at 37 °C for 10 min, the optical density was measured at 405 nm using a microtiter plate reader (Labsystems AiG SL, Barcelona, Spain).

Sanger sequencing analysis

Phagemid DNA from selected clones identified by phage enzyme immunoassays was prepared with a small-scale plasmid preparation kit (Qiagen, Hilden, Germany). The OmpSeq primer (5′-AAGACAGCTATCGCGATTGCAG-3′) and HRML-F primer (5′-GGTGGTTCCTCTAGATCTTCC-3′) were used to sequence the VH and VL chains of the antibody.14 Sequence analysis of positive clones (O.D.405nm>0.3) was performed by Macrogen (Seoul, Korea).

Next-generation sequencing analysis

NGS analysis was performed as described previously.16 A total of 15 sets of phagemid DNA including three initial chicken scFv libraries and three libraries obtained after each of four rounds of bio-panning were analyzed using a MiSeq system (Illumina Inc., San Diego, CA, USA). The MiSeq library for DNA sequencing was prepared using Illumina Nextera XT chemistry (Illumina) following the protocol provided by the manufacturer. The genes from the chicken library were amplified using the forward primer (pre-adaptor, 5′-TCGTCGGCAGCGTC-3′; sequencing primer, 5′-AGATGTGTATAAGAGACAG-3′; specific locus primer, 5′-TCAGCCTCGTCTGCAAGG-3′), and reverse primer (pre-adaptor, 5′-GTCTCGTGGGCTCGG-3′; sequencing primer, 5′-AGATGTGTATAAGAGACAG-3′; specific locus primer, 5′-AGTGGAGGAGACGATGACTTC-3′), respectively. The final libraries were normalized by quantification with LightCycler 480 II (Roche Applied Science, Indianapolis, IN, USA) and qualification with Bioanalyzer (Agilent, Palo Alto, CA, USA). The final loading concentration was adjusted to 11 pM following the MiSeq loading protocol. The MiSeq reagent kit v3 (Illumina) was used for long paired-end reads (2 × 300 bp) sequencing reactions. The sequencing data was processed by CLC Genomics Workbench version 5 (CLC Bio, Aarhus, Denmark) software. Low-quality sequencing data were first trimmed depending on quality scores using PHRED with the minimum quality score of 20 and reads with less than 150 bases in length were discarded.17 The cleaned-up sequencing data were processed by merging the paired-end sequence reads using fast length adjustment of short reads to obtain complete sequences of the chicken scFv libraries.18 Sequencing data were further cleaned up using PRINSEQ (San Diego State University, San Diego, CA, USA), setting the minimum quality score at 20 and read length at 150.17 EMBOSS Needle 6.5.0.0 (The European Bioinformatics Institute (EMBL-EBI), UK) was used to map sequence read in the HCDR3 region, with a threshold score of 300.19 Subsequently, a custom Perl script was used to summarize and count sequence reads in 15 sets of phagemid DNA. We merged the read counts across all the panning rounds, but for computational and statistical analysis, we only counted the reads existing in the phagemid DNA after the fourth bio-panning round.

Clustering analysis

An optimized number of clusters in the merged sequence read counts was estimated using the clValid algorithm, to facilitate pattern analysis of NGS data for population shifts in antibody clones throughout the bio-panning process.20 The clValid algorithm validated number of clusters by assessing intra-cluster homogeneity and inter-cluster separation, and the assessment for each and every clustering is represented in the Dunn index.20 A higher Dunn index indicates better clustering. The ‘Internal’ cluster validation metrics were chosen, which consider only the data set and the clustering partition, and the intrinsic properties of the data were used to evaluate the quality of the clustering results in designated clustering algorithms such as hierarchical clustering and k-mean clustering.21 Unsupervised hierarchical clustering analysis was used to cluster HCDR3 sequences according to the number of clusters estimated by clValid. Ward’s method was used to measure distances between sequence reads based on read counts throughout the bio-panning, and a heat map visualizing the sequence read changes in each cluster was generated using Gene Pattern v3.9.2 software.22 Line charts representing the pattern of sequence read changes in each cluster across all the bio-panning rounds were then generated as in a previous study.

Cloning to retrieve scFvs

To rebuild real scFv clones from the virtual HCDR3 sequences in the clusters, we performed two-step linker PCR. In the first PCR step, primers targeting both LFR1-HCDR3 (LFR1_F primer, 5′-GTGGCCCAGGCGGCCCTG-3′) and HCDR3-HFR4 fragments (HFR4_R primer, 5′-CTGGCCGGCCTGGCCACT-3′) were synthesized, based on HCDR3 sequences determined in NGS analysis and phagemid DNA obtained after the 4th round of bio-panning. The second PCR step linked these two gene fragments into a single scFv gene using primers annealing to LFR1 and HFR4 (LFR1_F primer, 5′-GTGGCCCAGGCGGCCCTG-3′; HFR4_R primer, 5′-CTGGCCGGCCTGGCCACT-3′). The scFv gene was ligated into the pComb3XSS phagemid vector and rescued as scFv-displaying phages, as described previously.14 To measure the binding reactivity of these scFv-displaying phages, we rescued more than 15 clones per HCDR3 sequence, and performed phage enzyme immunoassay as described earlier. We regarded the clone providing the highest optical density at 405 nm as the retrieved clone.

Statistical analysis

Statistical analysis was performed with GraphPad Prism 5 software. Specific P-values and statistical methods are provided in the figure legends.

Results

Generation of antibody library and screening for positive clones using the conventional colony screening method

Using mRNA prepared from spleen, bone marrow, and bursa of Fabricius from three PSA-immunized chickens, we generated scFv libraries with complexities of 6.09 × 1010, 3.64 × 1010 and 5.16 × 1010 clones, respectively, referred to as chicken libraries 1, 2 and 3. Next, we performed four rounds of bio-panning, rescued phage clones from the output titer plate of the fourth round, and performed a phage enzyme immunoassay to screen for positive clones. A total of 300 clones (100 clones in each library) exhibiting an optical density of >0.3 at 405 nm were considered to be positive, and their scFv gene sequence was determined by Sanger sequencing analysis. We finally obtained 22 clones with unique HCDR3 sequences (Table 1).

Table 1 HCDR3 amino-acid sequences selected using the conventional colony screening method, and binding reactivity measurement of the antibody clones

Diversity analysis of antibody clones using next-generation sequencing

A total of 15 sets of phagemid DNA (three chicken libraries from bio-panning rounds 0, 1, 2, 3, and 4) were used for NGS analysis. After the NGS experiment, we obtained 60,000–180,000 VH sequences. Raw paired-end nucleotide sequences were merged, filtered, aligned and trimmed by uniformly applying pre-specified criteria to remove low-quality and meaningless short sequences. The numbers of nucleotide sequences remaining after each preprocess are summarized in Table 2; 44–53% of the original sequences were retained after aligning with OmpSeq primer sequence14 by Needle, and were used in subsequent analyses. From the NGS results, the total population of VH fragment nucleotides decreased as the bio-panning rounds proceeded. To analyze HCDR3 diversity and frequency, we used HCDR3 sequences existing only in the fourth bio-panning round. clValid predicted that 2–6 clusters would be the most dependable in the HCDR3 sequence count profile data (Table 3). The sequence reads in chicken library 1 showed the maximum Dunn index (0.1048) with 4–6 clusters, and chicken libraries 2 and 3 had maximum Dunn indices with 2–3 clusters. We clustered HCDR3 sequences into 2–6 clusters using hierarchical clustering, and generated heat maps for each cluster to examine the patterns of HCDR3 sequence enrichment and population shift throughout the bio-panning rounds. The pattern of HCDR3 sequence enrichment and population shift in chicken library 1 showed four clear clusters, and the patterns in chicken libraries 2 and 3 showed three clear clusters (Figure 1).

Table 2 Sequence read counts by preprocessing raw sequencing data
Table 3 Dunn index on hierarchical clustering to estimate optimal number of clusters in scFv nucleotide sequence profile data
Figure 1
figure 1

Heat map representing the population of heavy chain complementarity-determining region 3 (HCDR3) sequences in each cluster through bio-panning rounds. Red and blue denote high and low proportions of the HCDR3 sequence, respectively. (a) scFv library 1, (b) scFv library 2 and (c) scFv library 3.

Population shift in HCDR3 sequences throughout bio-panning rounds

The diversity of the antibody clones is represented by the number of HCDR3 sequences that belong to each cluster (Figure 1). The abundance of the HCDR3 sequences in each cluster is represented by heat map color; high and low populations are indicated in red and blue, respectively. HCDR3 sequences in cluster 1 were highly abundant before bio-panning and up to the second bio-panning round. However, there was a sudden impoverishment in rounds 3 and 4 of bio-panning. In contrast, HCDR3 sequences that belonged to clusters 2 and 3 (including cluster 4 of library 1) showed the opposite pattern. Their populations were very low before bio-panning, remained low after the second round of bio-panning, and started to enrich from the third round of bio-panning. The increase continued in the fourth round of bio-panning. This population shift of HCDR3 sequences throughout bio-panning is represented in Figure 2. All 22 HCDR3 sequences in clones found via the conventional colony screening method existed among the HCDR3 sequences obtained from NGS analysis of phagemid DNA prepared after the fourth round of bio-panning (Table 1). Two out of the 22 unique HCDR3 sequences belonged to cluster 1, and the other 20 HCDR3 sequences belonged to clusters 2, 3 or 4.

Figure 2
figure 2

Line graph representing population shifts in HCDR3 sequences through bio-panning rounds. (a) scFv library 1, (b) scFv library 2 and (c) scFv library 3.

Reactivity of scFv clones identified in NGS analysis

For each cluster, 1–5 HCDR3 sequences newly identified from the fourth round of bio-panning via NGS analysis were selected arbitrarily (Table 4). These selected sequences were used to synthesize the primers to retrieve the whole scFv gene from the phagemid DNA. The scFv gene was prepared in two-step linker PCR using the primers and cloned into a phagemid vector (Figure 3). After transformation of the phagemid vector-encoding scFv gene and rescue with helper phage, scFv-displaying phage was used to test their binding reactivity against PSA (Figure 4). In cluster 1, across the three libraries, 12 out of 14 antibody clones (85.7%) had negligible binding reactivity against PSA (O.D.450nm<0.2; Table 4, blue). In contrast, 21 out of 26 antibody clones (80.8%) in clusters 2~4 across the three libraries had significant binding reactivity (O.D.450nm>0.3; Table 4, red). These results imply that antibody clones with low reactivity tend to be impoverished throughout bio-panning (cluster 1), in contrast to the antibody clones with high reactivity, which showed enrichment throughout bio-panning (clusters 2~4).

Table 4 HCDR3 amino-acid sequences selected in each cluster from NGS and binding reactivity measurement of antibody clones
Figure 3
figure 3

Schematic representation of next-generation sequencing and two-step linker PCR. The structure of scFv gene, CDRs and frameworks of variable regions are indicated by colored boxes. (a) For NGS analysis, most of VH region including HCDR3 was amplified and sequenced using specific primers as described in materials and methods. The sequencing coverage is indicated with dashed lines. (b) To retrieve scFv gene, two-step linker PCR was performed using primers annealing to HCDR3, LFR1 and HFR4. The first step of PCR was performed using LFR1_F and HCDR3_R primers and HCDR3_F and HFR4_R primers. The linker PCR was performed using LFR1_F and HFR4_R primers.

Figure 4
figure 4

Binding reactivity of scFv antibodies retrieved from selected HCDR3 amino-acid sequences in each cluster using NGS. (a) scFv library 1, (b) scFv library 2 and (c) scFv library 3. ANOVA with Turkey’s multiple-comparison test was used to compare cluster 1 with other clusters. In library 3, the P-value was calculated using the Mann–Whitney U-test. *P-value <0.05; **P-value <0.01; ***P-value <0.001. ANOVA, analysis of variance.

Discussion

Phage display technology is commonly employed in the development of therapeutic antibodies.23 One of the major drawbacks of phage display technology lies in the fact that some clones become dominant through the bio-panning process, and frequently, only limited numbers of clones became available at the end of the screening.16 In this study, we showed that NGS analysis provided not only all of the HCDR3 sequences of clones identified using the conventional colony screening method but also new HCDR3 sequences. The proportion of HCDR3 sequences found by the conventional colony screening method varied from 0.01 to 6.49% of the HCDR3 sequences identified by NGS analysis (Table 1). There was no significant correlation between these two proportions.

After we successfully retrieved the entire scFv gene via two-step linker PCR using PCR primers designed based on the HCDR3 sequences from NGS, we measured the binding reactivity of these antibody clones. Of the 40 clones retrieved from the phagemid DNA pool prepared after the fourth round of bio-panning, positive binding reactivity was confirmed in 26 clones. Four clones in library 1 and one clone in library 3 exhibited a proportion of >1% among the HCDR3 sequences obtained after the fourth round. The proportion of two positive clones in library 1 and library 3 were 2.16 and 4.79%, respectively. All these clones successfully formed colonies after the retrieval process. Why these clones with such high proportions were not identified in the conventional colony screening method is not clear; however, it might be caused by either inherent toxicity with phage assembly or interference from bacterial growth.6, 24

Among the clones retrieved, thirteen clones with proportions less than 0.1% showed binding reactivity. Two of the positive clones were present at a proportion of 0.01%. Theoretically, screening for a clone with such low proportions via the conventional colony screening method requires either 1000 or 10000 positive colonies and Sanger sequencing analysis, which would require significant resources of time and cost.

Many scFv clones prove difficult to identify by the conventional colony screening method from the phage pool obtained after bio-panning, and a method for retrieving these scFv clones in a high-throughput way has not yet been developed. Using currently available NGS tools, it is not possible to sequence the entire scFv gene, which is about 750 bp in length, without error. Our study and the other previous studies6, 13 have proved that the scFv gene can be amplified by PCR primers designed based on HCDR3/FR4 sequences. However, there is always the possibility of cross-priming between clones during the PCR process. In this study, we also confirmed that clones retrieved via two-step linker PCR are typically a mixture of both negative and positive clones.

We also proved that when a certain HCDR3 sequence was enriched through the bio-panning process, the clone with the HCDR3 is more likely to be positive; in this study over 80% (21 out of 26 clones) (Table 4). Monitoring the enrichment or impoverishment pattern of HCDR3 during the bio-panning process might increase the efficiency of retrieving clones from NGS analysis. However, this two-step linker PCR and cloning process is very difficult to perform in a high-throughput manner and there is definitely a need for better way to retrieve whole scFv gene.

In summary, NGS analysis of the HCDR3 sequence, and two-step linker PCR using PCR primers based on this sequence, provide an effective way to retrieve antigen-specific scFv clones that are difficult to identify by the conventional colony screening method. Enrichment of the HCDR3 sequence over the bio-panning process is a positive predictive factor in the selection of scFv clones harboring binding reactivity.