Keywords

1 Introduction: Encoding Molecules with Genetic Tags

Encoded chemical library technologies are based on the covalent linkage of chemically synthesized compounds to synthetic oligonucleotides that serve as barcode identifiers for the compound structure (Fig. 1a). This approach to screening library handling is commonly called phenotype–genotype coupling. It has its biological precedent in the phage-display, ribosome-display, or mRNA-display libraries which are widely used for screening for bioactive peptides and proteins. Like their biological relatives, genetically tagged chemically synthesized libraries can be screened for bioactive compounds on targets as complex mixtures, without the need for the large infrastructure needed to screen discrete libraries of compounds (high-throughput screening/HTS).

Fig. 1
figure 1

Solution-phase synthesis of DNA-encoded libraries (DEL) by iterative cycles of chemical synthesis and barcode ligation. (a) Schematic presentation of a DNA-encoded molecule. (b) Split-and-pool encoded combinatorial chemistry workflow. (c) Structure and sequence of the “headpiece” DNA oligonucleotide. (d) Generic scheme of the synthesis of a DNA-encoded library initiated with the short hairpin duplex “headpiece” DNA

A seminal publication by Richard A. Lerner and Sydney Brenner in 1992 described for the first time the concept to synthesize molecular chimeras consisting of a genetic tag that was covalently connected to a chemically synthesized molecule [1]. In those early times of encoded chemistry, libraries were synthesized on beads that contained two linkers, one allowed for chemical coupling of amino acid building blocks by peptide coupling, the other for chemical coupling of DNA nucleotides by phosphoramidite nucleic acid chemistry [2, 3]. Coupling of amino acid building blocks and of DNA nucleotide building blocks was performed iteratively, and in combinatorial manner, so that each cycle of synthesis and encoding led to an exponential growth of the library, yet the different chemistries need for peptide synthesis and for phosphoramidite synthesis posed serious constraints to the initial setup. Since the infant days of encoded combinatorial chemistry, the field of encoded library technologies has branched out, and today several barcoding strategies are available for the synthesis of encoded libraries, which will be discussed in this chapter. They can broadly be divided into DNA-recorded techniques where the genetic tag serves to record the synthetic steps and techniques that use the DNA barcode for templating chemical reactions [4]. The class of DNA-recorded libraries comprises split-and-pool encoded solution-phase combinatorial synthesis [5,6,7], solid phase-initiated DNA-encoded combinatorial chemistry, and DNA-encoded solid-phase chemistry [8], as well as hybrid DNA/PNA-encoded chemistry [9]. Further techniques that are based on recording reactions with DNA are dual-pharmacophore technologies and DNA-encoded dynamic combinatorial chemistry which enable both fragment screening and screening of complex molecules [10]. DNA-templated approaches comprise DNA-templated DEL synthesis which were mainly used for the construction of macrocyclic structures, and the yoctoReactor (TM) design, both using the DNA templates to direct the oligonucleotide-linked reagents into proximity to the reaction site within complex mixtures [11, 12]. Alternatively, DNA/PNA hybrid duplex structures may be used in templated reaction strategies [9]. Notably, also the DNA-routing approach uses immobilized DNA strands to direct DNA templates to individual reaction vessels [13]. The use of single-stranded designs further enables the pairing with additional modular functionalities which can be exploited for advanced DEL selection strategies [1], such as photocrosslinking [2], affinity maturation of existing ligands [3], or the use of covalent or reversible-covalent warheads [4]. Such strategies may facilitate the identification of valuable compounds that bind to a biological target and serve as a starting point for drug development programs.

2 Split-and-Pool Encoded Solution-Phase Combinatorial Chemistry

Split-and-pool encoded solution-phase combinatorial chemistry, which today is the most used approach to construct DNA-encoded libraries, was first described in a publication in 1995 that described the combinatorial enzymatic ligation of DNA codes (Fig. 1) [14]. Seminal publications from the Neri group (ETH Zurich), and from a team headed by Barry Morgan, then at Praecis, described the first split-and-pool encoded library approaches [15, 16]. Morgan et al. designed a double-stranded DNA to encode compounds [15]. The overall architecture of their barcode is shown in Fig. 1a. It consisted of the encoded compound that was connected via a polyethylene glycol chain to both terminal nucleotides of the duplex DNA, forming a hairpin structure. Terminal primer regions of the DNA construct allowed for amplification of the genetic information by polymerase chain reaction (PCR). Suitable primer sequences have been published by different research groups [15, 16]. Alternatively, validated primers from genomics research, SELEX, or phage-display may be used, if researchers entering the DEL field wish to use other primer sequences. The number of internal degenerate 8–10 mer regions that contained the compound barcode mirrors the number of chemical reaction steps, and the number of barcodes per reaction step mirrors the number of chemical building blocks. Barcode sequences may be designed using freely available statistics software such as R, in order to avoid long homopurine or homopyrimidine sequences that may lead to sequencing artifacts and to define the Hamilton distance of barcodes. The Hamilton distance indicates the minimal number of different nucleosides between any two barcodes of a coding region. It is advisable to set the Hamilton distance to at least two for unambiguous compound identification by sequencing. The design of smaller libraries and/or longer coding regions would allow for a higher Hamilton distance. In the coding scheme published by Morgan et al. [16] any barcode sequences were connected by two-base regions that are needed for enzymatic barcode ligation with T4 ligase.

The workflow of encoded library synthesis was initiated with a short hairpin duplex DNA, called the “headpiece” that contained a polyethylene linker connecting the two arms of the hairpin (Fig. 1c). This linker carries a functional group for attaching the encoded molecule. The hairpin consisted of a duplex DNA sequence of six nucleotides and a two-base overhang that allowed for ligating a set of duplex (ds)DNA oligonucleotides with T4 ligase. This set of dsDNA oligonucleotides contained forward primer sequence, barcodes for the first set of chemical building blocks, and a two-base overhang for the second sticky-end ligation. Following barcode ligation, chemical building blocks were coupled to the linker part of the DNA constructs. As the linker moiety most often contained a primary amino group as a functional handle, any building blocks need to be substituted with a carboxylic acid, and at least one further functional group for subsequent encoded compound synthesis steps. In order to diversify the reagent space in this first cycle of library synthesis, alternative alkyl linkers were developed that contained a leaving group for substitution by nucleophiles [17]. Furthermore, chloroacetic acid or allyl-protected glycine was coupled to the linker, which allowed for synthesis of secondary amines from abundantly available primary amines and aldehydes, respectively [18, 19]. This process of encoded compound synthesis is called a cycle of encoded chemistry and concluded with a purification and pooling step (Fig. 1b). The last step is usually carried out by precipitation of the DNA conjugate from ethanolic solution. The cycles of synthesis and encoding are iterated as defined by library design. In the publication by Clark et al. [16], three and four cycles of encoded library synthesis have been described that yielded 7,000,000 and 800,000,000 encoded molecules, respectively (Fig. 1d). Libraries undergoing more combinatorial cycles have been synthesized to reach higher compound numbers. A library of encoded macrocycles was synthesized through six cycles of encoded peptide chemistry [20]. Today, libraries that are designed by three or even only two cycles [19] are increasingly being designed, likely, because higher numbers of synthesis steps tend to give rise to molecules with higher molecular weight and functional group compositions that may prove more challenging to develop toward drugs.

An alternative barcoding strategy developed by the Neri group used a long single-stranded (ss)DNA that contained a 5′-terminal linker for the encoded compound, the forward DNA primer, a compound barcode and region that was partially complementary to a set of ssDNA oligonucleotides containing the second barcode [16]. After coupling of compounds to the DNA oligonucleotides, all conjugates were pooled and split for the second cycle of encoded synthesis. Unlike the hairpin strategy that used T4 ligase and sticky-end ligation to concatenate barcode oligonucleotides prior to compound synthesis, the Neri group first performed compound synthesis, and then annealed a set of ssDNA barcode oligonucleotides to the first set of barcode ssDNA for primer elongation by Klenow fill-in (Fig. 2a-1). This strategy can be used to synthesize two-cycle libraries. A third cycle of library synthesis to reach a million-sized encoded library was demonstrated following digest of the construct with a restriction endonuclease and sticky-end ligation with T4 ligase of a third DNA barcode [21]. A more versatile encoding strategy for concatenating (theoretically) any number of ssDNA barcodes was developed by the Scheuermann and Neri groups and used 20-mer splint oligonucleotides that were partially complementary to barcode oligonucleotides. Here, the internal compound barcodes were accordingly flanked by longer 10-mer sequences that were annealed to the splint ssDNA [22, 23] (Fig. 2a-2).

Fig. 2
figure 2

Alternatives to the “headpiece” DNA-barcoding strategy. (a) Encoding strategies using Klenow fill-in or splint ligation for concatenation of single-stranded DNA barcodes. (b) Barcoding strategies initiated by DNA oligonucleotides coupled to controlled pore glass (CPG) solid phase

All solution-phase encoded library strategies require chemical building blocks and chemical reactions which are adapted to the process. The building blocks need to be functionalized to enable the encoded compound process, and there also needs to be a diverse set of them available to enable large split sizes, e.g. numbering hundreds of building blocks. The less cycles are used for DEL design, the more important is the ready availability of a class of building blocks. Library chemistry must be compatible with aqueous solvents and with the DNA chemical reactivity. Furthermore, methodology needs to be uniformly high-yielding and to perform robustly on a broad variety of substrates. The available reactions for library design and DNA damage reactions that need to be avoided are discussed in greater detail in chapter “Advancements in DEL-Compatible Chemical Reactions” of this volume.

In order to expand the scope of DEL design, the first cycles of library synthesis may optionally be carried out on DNA sequences covalently connected to a solid-phase matrix, for instance, on controlled pore glass (CPG) (Fig. 2b). This material is a standard solid phase for automated chemical oligonucleotide synthesis and allows for working under strictly dry conditions. Features of this approach are the option to perform reactions under strictly dry conditions, DNA nucleobase protection that prevents chemical deamination, and the possibility to remove reagent excess, and especially potentially harmful contaminants such as metal ions, under stringent washing conditions. Following on-DNA compound synthesis, the DNA oligonucleotides are ligated to further DNA barcodes. The DNA oligonucleotides may be ssDNAs that contain a partially primer sequence [24], primer and barcode [25, 26], short headpiece adapters such as the hexathymidine (hexT; Fig. 2b-1) headpiece [27, 28], or barcodes that are optionally chemically stabilized (csDNA; Fig. 2b-2) for greater reaction tolerance [29]. The latter two barcoding strategies used 4-mer overhangs to concatenate barcodes [30]. Both the hexT headpiece and chemically stabilized csDNA that is composed of pyrimidine nucleobases and 7-deazaadenine have been shown to tolerate reaction conditions that are hardly compatible with native DNA such as low pH and transition metal catalysis under forcing conditions. Thus, they expand the accessible chemical reaction space for solution-phase DEL design.

3 Split-and-Pool Encoded Solid-Phase Combinatorial Chemistry

The Kodadek and Paegel groups developed DNA-barcoding strategies for one-bead-one-compound (OBOC) synthesis (Fig. 3a) [31, 32]. Their DNA-encoded solid-phase synthesis (DESPS) approach employed a functionalized linker on Tentagel macrobeads (Fig. 3b). The bifunctional linker was functionalized with an azide to install a 6-mer dsDNA headpiece “HDNA” whose two strands were connected via an azide-substituted polyethylene glycol chain via copper-catalyzed azide alkyne cycloaddition (CuAAC). Like the headpiece solution-phase DEL strategy, the DESPS headpiece contained an overhang for ligation of the forward primer and first barcode (Fig. 3a). Following enzymatic ligation of the first dsDNA, the solvent could be exchanged by simple filtration, and a first encoded compound synthesis step was performed on the second arm of the bifunctional linker. This arm contained an amine as handle for encoded compound synthesis, and additionally a chromophore and an arginine residue to facilitate compound synthesis quality control by chromatographic and mass spectrometric analysis. Like the headpiece DEL strategy, encoded OBOC libraries were synthesized by iterative split-pool combinatorial cycles of compound synthesis and enzymatic ligation of DNA barcodes, here via three-base overhangs, resulting in oligomer compounds and DNA barcodes flanked by PCR primer sites. The reaction scope for this technology takes advantage of the option to exchange aqueous solvents by organic solvents and encompasses carbonyl chemistry, as well as cross-coupling reactions, aldol reactions, and a suite of aldehyde-diversifying reactions [8]. Encoded OBOC libraries may be screened by affinity-based selection screens [32]. In the future, they may offer the option to perform functional and even phenotypic screens, after cleavage of the encoded molecules from the DNA-encoded beads [33].

Fig. 3
figure 3

DNA-encoding strategies for the synthesis of encoded one-bead-one compound (OBOC) libraries. (a) Generic workflow of the encoded OBOC library synthesis. (b) Structure of the solid phase for encoded OBOC libraries

4 PNA-Encoded Chemistry

Peptide-nucleic acid (PNA)-encoding strategies have been developed by the Winssinger group as an alternative to barcoding of compound synthesis with DNA [9]. Peptide-nucleic acids (PNAs) are artificially synthesized oligomers, having a backbone composed of repeating N-(2-aminoethyl)glycine units linked by peptide bonds, to which the various purine and pyrimidine bases are attached by a methylene bridge and a carbonyl group. PNA oligonucleotides specifically and strongly hybridize to complementary DNA or RNA sequences, forming duplexes that are resistant to degradation by either nucleases or proteases (Fig. 4a) [34].

Fig. 4
figure 4

Peptide-Nucleic Acid (PNA)-encoded chemistry. (a) Generic scheme of a PNA-encoded library synthesis. (b) Screening of PNA-encoded libraries on microarrays. (c) Indirect identification of PNA-encoded compounds by microarray analysis

Contrarily to unprotected DNA, PNA as coding oligomer does not easily undergo depurination, lacking the ribose sugar backbone and has a much higher thermal and pH-stability, enabling the use of a much broader spectrum of chemical methods for library preparation. Molecular diversity in PNA-encoded libraries can be achieved via a broad array of chemistries, including reactions catalyzed by transition metals, transformations yielding heterocyclic scaffolds, cyclization, and protic acid-promoted reactions [35, 36]. Currently, a considerable amount of peptide- and non-peptide-based PNA-encoded libraries have been reported, proving the broad applicability of the chemistry and the robustness of the encoding strategy [9].

However, the fundamental problem with PNA-encoding is that it does not benefit from the efficiency of enzymatic barcode ligation nor is the code amplifiable by PCR or PCR-like techniques. The simplest format for the identification and quantification of selection outcomes from PNA-encoded libraries generally includes direct hybridization of the selected compounds, bearing a fluorophore on each PNA tag, to a DNA microarray (Fig. 4b) [37, 38]. Nevertheless, the success of this screening method strictly depends on the concentration of PNA tags, which should be sufficient for microarray-based detection [39].

To circumvent this limitation, novel screening strategies have been developed which offer the possibility to perform indirect DNA amplification of PNA tags prior to microarray analysis (Fig. 4c) [40, 41].

A merger of DNA- and PNA-encoding has been achieved with the development of a mating strategy that uses DNA template strands to direct the combinatorial ligation of PNA-encoded peptides. The resulting DNA/PNA-encoded peptides can be deconvoluted via amplification and sequencing of the DNA template strands [42].

5 Dual-Pharmacophore Encoding Strategies

In contrast to the DEL construction schemes described above featuring the display of just one chemical molecule on DNA (single-pharmacophore setup), dual-pharmacophore DELs feature the display of chemical entities (fragments and drug-like molecules) on the extremity of both the 5′ and 3′ end of DNA heteroduplexes. This approach had been pioneered by the Neri/Scheuermann group at ETH Zurich who reported the first such Encoded Self-Assembling Chemical (ESAC) library in 2004 [43]. Two sets of partially complementary sub-libraries, to which chemical compounds were conjugated at the 5′ and the 3′ terminus, respectively, were allowed to combinatorially assemble into stable DNA heteroduplexes (Fig. 5a). These fragment-like sub-libraries when assembled to a DNA heteroduplex would allow for a simultaneous screening of pairs of fragments for target binding, profiting from the chelate effect [44]. Initially, after PCR amplification ESAC libraries were read by hybridization to complementary oligonucleotides on microarrays, yet this read-out was performed individually for both sub-libraries, missing identification of pairs of synergistic binding moieties, initially limiting the potential of ESAC libraries to the lead expansion of known ligands [43]. The full potential of dual-display ESAC technology could only be uncovered using a novel setup that was compatible with the upcoming next-generation DNA sequencing. To this aim, a novel encoding methodology for enabling the transfer of the coding sequence from one strand to the other was proposed, based on the use of abasic DNA segments in one of the two sets of partially complementary oligonucleotides (Fig. 5a) [45]. The coding information of the 3′-strand is transcribed to the shorter 5′-strand by a DNA-polymerase assisted fill-in reaction (Fig. 5a). Together with the development of next-generation DNA sequencing, this method expanded the use of ESAC technology beyond affinity maturation and enabled the de novo discovery of pairs of synergistic binders [45]. A library consisting of 111′100 heteroduplexes obtained by the self-assembly of two mutually complementary sub-libraries of 550 × 202 fragments provided a single pair of fragments which was strongly enriched against alpha-1-acid glycoprotein (AGP), while the individual fragments did not show significant enrichment in the selection and did not display a measurable Kd. Furthermore, in ESAC selections against carbonic anhydrase IX, a validated tumor-associated antigen, a highly enriched pair of acetazolamide (a known 20 nM CAIX ligand) and a phenolic compound could be identified with 40-fold improvement in affinity compared with acetazolamide alone, which is currently in clinical development. In order to further promote the applicability of dual-pharmacophore DEL technology, novel encoding methodologies have recently been postulated: “large-encoding design” (LED) (Fig. 5b) features the assembly and encoding of multi-building block sub-libraries, which allows for the construction and screening of DELs of unprecedented sizes and designs. This technology features sub-libraries with a stable hybridization domain and non-hybridizing encoding parts in a Y-shape manner. After combinatorial assembly and selection on a target of interest, the codes of the selected heteroduplexes can be PCR-amplified using a junction primer and a terminal primer and further ligated to a single PCR product containing all coding information and which can then be interrogated by next-generation DNA sequencing [46]. The selection of dual-pharmacophore DELs is generally performed on purified and immobilized proteins, but recent work on the screening of dual-pharmacophore DELs against carbonic anhydrase IX (CAIX) expressing tumor cells revealed the potential of on-cell selection protocols profiting from a substantial increase of ligand recovery and selectivity through the bivalent display of ligands [47].

Fig. 5
figure 5

Encoding strategies for dual-display DELs. (a) In encoded self-assembling chemical libraries (ESACs), two sub-libraries encoded with partially complementary DNA strands are constructed separately. Despite the variable sequences coding for the first fragment, hybridization of the two sub-libraries is possible owing to the introduction of an abasic spacer region. The extension of the shorter strand by a DNA-polymerase transfers the second code onto the first DNA strand. (b) A large-encoding design enables the construction of ESACs displaying more than three fragments. Two individually synthesized sub-libraries are stably annealed through a short hybridization region. After a fill-in reaction aided by annealing of a junction primer (JP) and terminal primer (T3P) and ligation, a single strand containing all coding sequences is generated. (c) A circular encoding construct can be generated through stable hybridization of two sub-libraries. Two relay primers assist the fill-in reaction followed by ligation, yielding a single strand containing all coding sequences

A formidable challenge associated with the discovery of bidentate ligands from dual-display DELs relies on the identification of an optimal linker which connects the selected fragments/molecules. In most cases, several linkers differing in length, geometry, and rigidity may have to be tested, to identify the optimal solution for the scaffolding of synergistic pharmacophores. The choice of optimal linkers for the coupling of synergistic pharmacophores may be incorporated into future DNA-encoded library design. Recently, a strategy for “dimerizing” non-related single-stranded split-and-pool DELs into dual-pharmacophore DELs which allows for a modulation of fragment spacing and orientation has been devised as a circular construct (Fig. 5c), which may further broaden the scope of dual-pharmacophore DEL technology [10].

6 Conversion from Double-Stranded to Single-Stranded Library Format

A single-stranded library format is required for many of the recently reported advanced selection strategies. Apart from the deliberate single-stranded DEL synthesis two approaches have recently been proposed to convert double-stranded DELs into a single-stranded format. Selective digestion of one strand by exonuclease [48] or the use of a reversible-covalent DNA-headpiece designed to enable reversible interstrand photocrosslinking of a 3-cyanovinyl carbazole nucleoside on one strand with a thymine base on the opposite strand. Irradiation with either UV-A or UV-B light results in reversible conversion from the double- to the single-stranded DEL format [49]. Both methods hold the promise to enable transitioning between the robust double-stranded format and the versatile single stranded DEL format.

7 Dynamic Dual-Pharmacophore DEL Strategies

Dynamic combinatorial chemistry enables the identification of compounds binding to target structures with high affinity by connection of smaller fragments in the binding site [50]. In the context of DNA-encoded chemical libraries dynamic approaches rely on the incubation with the target to guide fragment assembly. In contrast to static dual-pharmacophore DNA-encoded libraries where a stable DNA heteroduplex is preformed, dynamic approaches feature a transiently stable DNA duplex upon incubation of the DNA-tagged fragment sub-libraries with the target. Tagging of fragments with short complementary DNA sequences allows for continuous shuffling of such tagged fragments, until pairs of fragments are binding to a target structure. After the equilibrium is reached, fragment pairs are locked to retain the information on binding pairs for decoding. A first pilot report was reported by Hamilton et al. in 2005, where a stable heteroduplex was heated above the melting temperature thus permitting the reshuffling of fragment pairs, which eventually lead to the enrichment of duplexes displaying the binding fragments [51]. About a decade later the dynamic concept in dual-pharmacophore DEL technology was elaborated independently by the groups of Yixin Zhang and Xiaoyu Li. A first dynamic proof-of-principle library was constructed and subjected to heating above the duplex melting temperature after the first round of selections. The resulting exchange of strands was found to lead to an enrichment of high-affinity duplexes [52]. A Y-shaped encoding setup with a very short hybridization region was proposed (Fig. 6a) where the locking of the equilibrium was accomplished by enzymatic ligation [53]. Further methods were established in order to lock the equilibrium state, e.g., by photocrosslinking of the two DNA strands [54], or by using short, complementary anchor DNA sequences [54] (Fig. 6b). More recently, chemical crosslinking by using p-stilbazole- or psoralen-modified DNA strands was proposed and tested [55, 56]. An interesting approach to facilitate ligand development after the identification of binding small molecule pairs, by linking of the small molecules rather than the two DNA strands was recently devised. Fragment conjugation was performed by imine formation of a single-stranded DEL containing a primary amine and a known ligand containing an aldehyde group, resulting in a dynamic imine library. After incubation with the target, the equilibrium could be locked by reduction of the imine [57]. Recently, an interesting PNA-based dynamic combinatorial supramolecular approach has been reported which is based on the use of a constant short PNA tag to direct the combinatorial pairing of fragments [58]. While the concept of dynamic DELs is very promising, it must be mentioned that, until today, no real, large dynamic DEL, numbering hundreds of thousands, if not millions of compounds, has been reported in literature.

Fig. 6
figure 6

In DNA-encoded dynamic combinatorial chemical libraries, the duplexes are unstable and duplexes binding with high affinity are stabilized by incubation with the target. (a) For Y-EDCCL, the equilibrium reached after incubation with the target was locked by ligation of both strands. (b) After incubation with the target, the equilibrium can be locked by crosslinking prompted by UV irradiation. A relay-primer-bypass strategy relying on a terminal primer and a relay primer and subsequent ligation affords a single DNA strand comprising all coding sequences

8 DNA-Templated Synthesis

An alternative strategy for the assembly of DNA-encoded chemical libraries features the use of DNA-templated synthesis (DTS) [24, 59, 60] (Fig. 7a). In this case, pre-defined DNA templates are employed both for library encoding and for directing the library construction. This setup was pioneered by David R. Liu et al. in 2001 when they demonstrated that chemical reactions are promoted by bringing DNA-linked reagents into close proximity through Watson–Crick base-pairing [24]. The effective molarity of such reactions may be very high, thus allowing to conduct reactions which are otherwise considered difficult or impossible to implement with conventional chemistry. The concept was used not only for reaction discovery [61] but also for the construction of DNA-encoded, macrocyclic libraries. Small organic compounds chemically linked to short single-stranded biotinylated oligonucleotides as donors are transferred upon hybridization to suitable complementary DNA sequences on the template, followed by cleavage and avidin-assisted “donor” strand removal [24]. In a stepwise fashion the template library is annealed with a first set of code-complementary oligonucleotides bearing each a different reactant, which is then chemically attached to the template (Fig. 7a). The linker between the short oligonucleotide and the reactant is subsequently cleaved and this process could be repeated two more times leading to a trimeric linear library. Finally, the library may be cyclized and used for selection experiments against target proteins. The first reported DTS-DEL comprised 65 library members made from three sets of four amino acids, followed by cyclization through Wittig olefination (see Fig. 4).

Fig. 7
figure 7

DNA-templated synthesis uses pre-defined DNA template strands to direct library construction. (a) The hybridization of a library of DNA templates with a library of DNA-linked reagents brings reactants into proximity. Once the coupling of the first building block is complete, the hybridization and reaction steps are sequentially repeated for the remaining reagent libraries. (b) A universal template featuring tri-deoxyinosine anticodons enables hybridization with encoded reagents irrespective of their coding DNA sequence. After hybridization, the reagent DNA is ligated to the universal template and the chemical reaction takes place. This process is repeated for the remaining reagent libraries

This proof-of-principle study was successful in selections against carbonic anhydrase. After technical optimization, a similar, yet larger macrocyclic library of 13,824 DNA-encoded compounds was constructed and reported in 2008 [24, 62], yielding hits for Src kinase and VEGFR2 [63], and six macrocycles against insulin degrading enzyme (IDE), which were also co-crystalized [64, 65].

In 2018, Usanov, Liu, and coworkers published a second-generation DNA-templated library of 256,000 macrocyclic members [66]. To do so, the DNA templates were computationally optimized regarding the orthogonality of each hybridization sequence and template assembly was improved through a polymerase-mediated strategy. In addition, macrocyclic Kihlberg rules [67] were applied to the design. With this DEL, selections were performed against IDE. One macrocyclic hit showed high potency with an IC50 of 40 nM [66]. The company Ensemble Therapeutics, co-founded by David R. Liu, used DTS for the construction of macrocyclic DELs in an industrial setting. In 2015, the company reported a DNA-templated macrocyclic DEL in collaboration with Bristol Meyer Squibb, consisting of five sets of building blocks which were eventually cyclized by a copper-catalyzed azide/alkyne cycloaddition (CuAAC) reaction [67]. Four libraries of 40,000 members were screened against XIAP BIR2 and BIR3 domains [68] and inhibitors with the ability to displace bound pro-apoptotic caspases were identified [69]. The positive aspects of DTS for DEL construction certainly encompass high chemical yields, access to unusual chemical reactions, one-pot reactions with reagent oligonucleotide conjugates, as well as specific purification methods [59]. On the other hand, template library generation, code-specificity for a larger set of building blocks, as well as the creation of larger sets of reagent oligonucleotide conjugates, to reach split sizes of, e.g., a few hundred building blocks can prove challenging. In 2013, Li and coworkers reported a smart solution for omitting the tedious template library generation by proposing a “universal template” consisting of a DNA hairpin containing tri-deoxyinosine anticodons which allow for the indiscriminate base-pairing and subsequent ligation with 3-base encoded reagent oligonucleotide conjugates (Fig. 7b). The feasibility of this DTS approach could be demonstrated with proof-of-principle DELs of 64*64*28 3-base codons [70].

9 DNA-Routing

The group of P. Harbury described “DNA-routing” [71] as a different DNA-template based strategy for library construction. This approach enables combinatorial chemistry by iterative sequence-specific immobilization-reaction and purification–elution step. A library of single-stranded DNA templates comprising the codes for the routing is subjected to resin columns with the immobilized complementary DNA code sequences and subsequently chemically modified (Fig. 8). After repooling, this split-and-pool step is then repeated for all diversity elements. In 2004, Halpin and Harbury described the construction of a combinatorial peptide library using a 340-mer oligonucleotide combinatorial template library consisting of alternating 20 base coding and 20 base constant regions obtained by PCR assembly of overlapping complementary 40-mer oligonucleotides [72]. To serve the purpose of routing on complementary DNA resin the double-stranded template library then was converted into single-stranded DNA by reverse-transcription and base-mediated hydrolysis of the RNA strand.

Fig. 8
figure 8

A library of single-stranded DNA templates serves as the starting point for the construction of DELs by DNA-routing. DNA templates are separated by hybridization with anticodons immobilized on resin columns before being chemically modified. After pooling of all eluates, further rounds of sequence-specific separation and reaction can be carried out

The routing anticodon resin is followed by a chemistry step performed on weak anion-exchange columns, varying high and low salt conditions [73]. The combined eluates after the reaction step can subsequently undergo further split-and-pool cycles. The routing can be performed sequence-specifically and efficiently with overall yield of 0.85 per hybridization round. Using this DNA-routing strategy an N-acylated pentapeptide library with a complexity of 106 was constructed [71], using 10 different amino acid building blocks for the first positions and nine carboxylic acids for the N-acylation step. The library included acylated leucine-enkephalin pentapeptides as a control and was subjected to affinity-based selections against the monoclonal antibody 3-E7 [74], a leucine-enkephalin binding antibody with 7.1 nM affinity.

The routing system was further optimized for peptide coupling yielding >90% conversion per reaction step and an 8-mer library of 100 Mio compounds was created and selected against the N-terminal SH3 domain of the proto-oncogene Crk [13]. Evolving over six generations, the hits converged to a small number of peptides with micromolar affinity. Further refinement allowed for a highly parallel performance in 384-well format [75] and the routing system, acting in analogy as an expanded genetic code, was tested for chemical evolution on protein kinase A (PKA) [76]. While the sole use of peptide bond chemistry and the sophisticated code design may pose restraints, the routing system allows for the generation of very large DNA-encoded libraries, numbering millions of compounds.

10 DNA-Directed Chemistry: The “yoctoReactor”

The final DNA-directed approach to be discussed in this chapter is the “yoctoReactor” or “yR” technology. The hallmark of the yR is a DNA sequence design that leads to self-assembly of two-four oligonucleotides which form a cavity with the name-giving volume of a yoctoliter, “yocto” being the lowest prefix in the SI system [12, 77]. Each of these DNA oligomers carries a functionalized chemical building block via a linker positioned within the sequence, and it is composed of a partial sequence for annealing to give the three- or four-way junction (the yR), and a partial sequence that encodes its building block (Fig. 9). The first set of encoded building blocks, i.e. the equivalent of the cycle 1 building blocks in encoded combinatorial chemistry, is coupled to its DNA oligomers via a stable linker, while the cycle 2 and 3 building blocks are coupled to their barcodes via a scissile linker. Self-assembly of the DNA oligomers to the yR directs the chemical building blocks into proximity for a chemical reaction that couples them to the target molecule. Compound synthesis thus leads in effect to covalent coupling of the DNA barcodes. These constructs can be purified from any non-reacted starting materials by denaturing polyacrylamide gel electrophoresis (PAGE). Thus, the purification step of the anneal-react-purify sequence provides high fidelity in library synthesis, as only the successfully synthesized molecules will enter the encoded library. This fidelity constitutes a major advantage versus DNA-recorded chemistry. On the other hand, the rules for DNA-compatible chemistry apply to the yR, too, thus, carbonyl chemistry and nucleophile substation are prevalent in library design; the choice of building blocks for DEL design is limited to those that contain at least two functional groups, precluding tapping into the vast pool of monofunctionalized chemicals; and an additional chemical step is needed to couple building blocks to the barcodes. Following PAGE, the barcodes of the purified products are ligated to concatenate the genetic information of the compounds. Then, the scissile linker is cleaved to provide the final encoded molecules of a reaction step. As a last step, prior to screening, the complementary strand of the barcodes is formed by primer extension. As the yR can be constructed by annealing the DNA oligomers in combinatorial cycles, this technology furnished millions of encoded compounds, too [78].

Fig. 9
figure 9

Principles of the “yoctoReactor” (yR) technology. (a) Stepwise library assembly using the three-way form of the yR; (b) Sequences of the oligonucleotides; (c) Exemplary round of selection, amplification, and translation

11 Conclusion

Taken together the various encoding strategies for DELs have enabled the broad field of DEL technology of today. While DNA-templated synthesis and DNA-routing have led to various DELs, split-and-pool-based methods nowadays are mostly used in practice (Table 1).

Table 1 Comparison of barcoding strategies

While for DNA-recorded DELs most care is generally taken for the optimization on the “chemical” side, i.e., by employing high-yielding DNA-compatible reactions to achieve diversity, the quality of the encoding also critically matters for the performance of DELs in selections.

Encoding in single-stranded DNA format may serve important advantages for DEL selection protocols, since such DELs can be paired with either further single-stranded libraries to form dual-display DELs or by pairing them with known leads, covalent warheads, or with photoreactive moieties displayed on the complementary strand, therefore strategies have been devised for both creating single-stranded DELs by ligation and converting a double-stranded DEL into a single-stranded one {Zhao, 2022 #40; Huang, 2022 #41; Gui, 2022 #42} [1, 11, 12]. Sophisticated selection protocols will expand the scope of DEL technology for the identification of small bioactive compounds.