Keywords

1 Introduction

Identification of biologically active small molecules is a multidisciplinary task at the heart of chemical biology and drug discovery (Bleicher et al. 2003). The cross-connectivity between hit generation, lead identification, optimization, and the evaluation of compounds in a biologically relevant environment renders this a challenging task. Modern drug discovery approaches have faced a decline in therapeutic output over the last decade (Carney 2005; Walters et al. 2011). While genomics and proteomics have identified many potential drug targets, the identification of innovative therapeutic agents has stalled (Carney 2005). The stark contrast between the 300–500 pharmaceutically targeted proteins and estimated 30,000 genes in the human genome underscores this discrepancy (Bleicher et al. 2003). Therefore, new avenues have to be taken, integrating efforts from drug companies, chemistry, and software development.

In this race for new drug leads, high-throughput synthesis is of fundamental importance as it fuels the pipeline with new chemical entities at an early stage of research. Moreover, rapid generation of compounds contributes significantly to lead optimization processes further along the pipeline. Taking into account the complexity of the development pipeline, synthesis is at the heart of the discovery process. The initial strategic planning of a screening library has changed since it became clear during the last 20 years that the size of a library is not as an important determinant as initially thought. During the mid-1990s, large compound collections dominated the field. Libraries included many chemically feasible or almost all commercially available molecules. However, low hit rates and difficulties at later stages of the discovery process such as unwanted ADMET properties (Hodgson 2001) and failed optimization schemes in medicinal chemistry (Walters et al. 2011) made many laboratories rethink their screening strategies to focus on the selection of compounds for their screening libraries (Schreiber 2009). Consequently, new high-throughput synthesis schemes were developed based on these experiences.

Diversity has emerged as a common denominator in modern library design (Fig. 1). Without any prior knowledge of the drug target, chemical libraries have to be diverse in order to probe any given target with an increased likelihood of finding biologically active molecules (Tan 2005). And even with information on privileged scaffolds or known ligands, screening diverse libraries offers an excellent opportunity for scaffold hopping toward novel drug candidates (Zhao 2007). Unfortunately, the definitions found for diversity in the context of small molecules are diverse themselves (Stumpfe and Bajorath 2011). For instance, a broad range of biological outcomes can define the functional diversity of compound selections. Alternatively, structural diversity can be described by the variety 3D arrangement of functional groups around diverse chemical scaffolds. Nevertheless, despite the fuzzy definition of diversity, the identification of a new chemical agent that acts as a modulator of a defined target, which elicits the desired therapeutic outcome, one has to realize that at some point all molecules have to be synthesized.

Fig. 1
figure 1

Schematic presentation of the diversity of a screening library. Clusters of closely related entities determine the composition of a typical compound library (left). These clusters originate from the similarity of the underlying synthetic chemistry (Nadin et al. 2012). Thereby, the chances to actually occupy the activity island of a drug target (gray shaded area) are significantly decreased. On the other hand, a chemically diverse set of compounds occupies a larger fraction of the chemical space (right), promoting lead identification (Patterson et al. 1996). Additionally, valuable information for progressing hits into leads emerges from screening a diverse library

With progress in instrumentation, screening methodologies, and theoretical library design strategies, the demands on chemical synthesis have risen tremendously. Notably, measures to assess the quality of a synthetic effort are plentiful and truly not limited to the final library itself. This final product has to be diverse but still compatible with the activity assay. Therefore, high rates of false positives such as aggregators should be avoided above all (McGovern et al. 2002). Moreover, fluorescent molecules should not be included when the detection during screening is fluorescence-based. Another important aspect of the library design is the optimization of the resulting hits from the screening. Sufficient synthetic room for medicinal chemistry should be provided already at the stage of library construction (Kodadek 2010). Finally, conflicts with intellectual property rights, the price for scale up of the synthesis, and compliance with ADMET should be considered before a compound collection is established.

In the following, we will summarize a selection of past and recent developments in high-throughput synthesis in order to give an overview, knowing that a comprehensive description of such a vivid field is impossible. From past experience, many philosophies have evolved and influenced the high-throughput synthesis. Many chemical principles arose to fulfill the demands outlined above. So, in the end, the chemist has to pick his tools from a diverse set of approaches to achieve the specific goal in mind.

2 Principles of High-Throughput Synthesis in Lead Generation and Optimization

High-throughput synthesis impacts drug development by increasing the transition between target identification and lead development. The use of simple and robust chemistry is one fundamental requirement for the strategic planning of a synthesis scheme with sufficient throughput. Hence, progress in organic chemistry has always driven progress in high-throughput synthesis and provides grounds for technological advances (Galloway et al. 2010; Kodadek 2010; Schreiber 2009). Following a series of synthetic steps, a screening library has to be generated in quantities and purities that allow characterization, screening, and follow-up studies. Besides sufficient diversity in compound selection, high-throughput chemistry also influences later stages of the lead discovery and optimization process. In this respect, high-throughput synthesis is no different than other elements of the drug discovery pipeline but has to become part of an integrated thought process.

Three basic steps determine the strategies for high-throughput synthesis being (1) the choice of the synthetic platform, (2) the starting scaffold, and (3) the diversification chemistry. Firstly, as the most fundamental choice, the chemistry is either performed in solution or on a solid support. Although many high-throughput chemistry campaigns favor solution phase methods nowadays, solid-phase chemistry has been the origin of combinatorial chemistry and has been the platform of choice for high-throughput synthesis methods in particular with respect to polymer-based compounds. Therefore, this review mainly focuses on this type of chemistry. The ease to drive reactions to completion, the ability to parallelize synthetic efforts, and finally purification of the products from excess of reagents and solvents are the major advantages of chemistry on solid support. Consequently, the choice of solid support material and linker chemistry is crucial. Many different solid supports have been developed over the past two decades, which vary according to their reactivity, solubility, stability, swelling, and surface chemistry. Even though polystyrene resins are commercially available and have found application in many research labs, the choice of solid support still relates to the synthetic route desired and may require case-by-case adjustment. The same is true for the choice of linker chemistry. In general, the linker should be chemically resistant but specifically cleavable to release the product from the resin. Moreover, the cleavage should not result in any artifacts left on the scaffold. Many UV-cleavable linkers have been developed that fulfill these requirements.

Secondly, a starting scaffold has to be attached to the linker on the solid support. This step determines the ability to optimize and scale up the generation of analogs, when lead structures are identified. In this, the choice of scaffold significantly establishes the diversity of the library. It is important to emphasize that the definition of diversity already varies from approach to approach at this particular stage of the synthesis. Diversity can result from similar scaffolds presenting diverse appendages or diverse scaffolds presenting similar functional groups in different spatial arrangements. This then determines how much investment into synthetic chemistry will be done at an early stage of the high-throughput synthesis pipeline. There is a great variability in current approaches in this point ranging from no diversity of the scaffold to high structural complexity of the starting skeletons. Finally, the third aspect of strategic planning of a high-throughput synthesis comprises the further diversification chemistries of the scaffold, again restricted by the ADMET, price, ease, and robustness of the chemistry in mind.

3 Combinatorial Peptide Synthesis

A powerful combinatorial approach arose with the availability of solid-phase peptide synthesis developed by Merrifield in 1963 (Merrifield 1963). Geysen and Houghten opened a new area with their pioneering work in combinatorial peptide synthesis (Geysen et al. 1984; Houghten 1985). Initially designed for the synthesis of very large libraries, using methods such as “divide, couple, and recombine” (DCR) (Houghten et al. 1991), many other studies followed exploring the newly developing field of combinatorial chemistry. Even though the diversity of these libraries was restricted to peptide chemistry, significant contributions in the field of lead discovery were already made in these early days (Houghten et al. 1991). Choosing mesh-packets as a solid support, so-called tea bags, or polyethylene rods (or pins), many important questions in combinatorial chemistry were addressed (Houghten et al. 1991; Weiner et al. 1992). The idea of systematic analysis of peptide binding targets using combinatorial libraries arose, and these studies directly contributed to biology and immunology (Pinilla et al. 1992).

In a seminal positional-scanning synthetic-peptide combinatorial library (PS-SPCL) approach, peptide mixtures of approximately 3 × 1011 components were screened for their ability to be recognized as T cell epitopes, adding up to a total of 6.4 × 1012 decapeptides analyzed (Hemmer et al. 1998). In another example, more than four trillion decapeptide sequences were screened against a monoclonal antibody to identify candidates with increased affinity (Pinilla et al. 1994). The approach was successful and peptides with affinities ten-times higher than the natural ligand were identified using competitive ELISA.

Alternative biological approaches toward peptide libraries arose with phage display techniques (Smith 1985), and the real strength of the synthetic approach exploring the chemical space of peptide libraries was discovered by introducing chemical transformations of the peptide backbone into the synthetic schemes. Small peptides are not very well suited as drugs because of their low stability and oral availability. As a result, the “Libraries from Libraries” approach was developed by chemical modification of sublibraries (Houghten 2000; Nefzi et al. 2004; Ostresh et al. 1994). One of the first examples, permethylation of a hexapeptide library, afforded around 40 million compounds as mixtures in solution (Ostresh et al. 1994). Soon afterward, the method included heterocycle synthesis, inspired by solid-phase peptide chemistry (Fig. 2). First performed by Leznoff and Rapoport, solid-phase heterocyclic chemistry really gained momentum when applied to the synthesis of benzodiazepine analogs some 20 years later by Bunin and Ellman (Bunin and Ellman 1992; Crowley and Rapoport 1976; Leznoff and Wong 1973; Pinilla et al. 2003; Wong and Leznoff 1973). The development of a sophisticated toolbox for chemical transformations such as acylations, alkylations, and reductions accelerated the diversification of the resulting compound mixtures.

Fig. 2
figure 2

Synthesis of heterocyclic compounds on solid support from dipeptides and acylated dipeptides as starting materials. Figure adapted from Houghten (2000)

The “Libraries from Libraries” approach results in large compound collections in a mixture-based format. This strategy requires complex deconvolution schemes of successive screening of smaller subsets for the identification of hits from these mixtures. Therefore, fine-tuning of the mixtures must guide the combinatorial synthesis scheme taking into account expected hit rates, thereby minimizing later deconvolution procedures (Barnes and Balasubramanian 2000). Employing mixtures instead of defined sets of compounds (vide infra) anticipates a significant decrease in time between synthesis and screening. For the process of lead development, this rather pragmatic and straightforward collection of information from screening mixtures is proposed to be advantageous. Taking into account the massive size of these screening libraries, other techniques might not be capable of handling it.

A major advancement in the field of combinatorial synthesis of peptide-like structures came with the solid supported synthesis from peptide-like backbone structure without further chemical manipulation by Kodadek and coworkers (2009). Using N-substituted glycine units (peptoids), the diversity of side chain chemistries on a peptide-like backbone presentation was explored. The side chain extends from the main chain nitrogen rather than the alpha carbon and allows preservation of favorable peptide conformations, while at the same time peptoids are protease resistant. Peptoids have been described in 1992 (Simon et al. 1992), but early attempts failed due to limited monomer supply and the resynthesis of smaller pools during the deconvolution process (Zuckermann and Kodadek 2009). Recently, advances in oligo-N-alkylglycine chemistry and screening technology enabled inexpensive screening of large peptoid libraries. One success story is the discovery of high affinity binders of the vascular endothelial growth factor receptor 2 (VEGFR2) (Udugamasooriya et al. 2008) (Fig. 3). Peptoids coupled to fluorescent beads led to the development of a two-color cell-binding assay. 250,000 compounds were screened in a one-bead one-compound fashion, collecting those beads selectively bound to VEGFR2 expressing cells using microscopy. The chemistry used to establish the library then allowed using Edman degradation of the collected beads for identification of the peptoid ligands. Consecutive dimerization of these 2 μM hits led to the development of inhibitors with an apparent dissociation constant of 30 nM. Imaging studies employing the resulting peptoid structures as targeting devices showed promising results (De Leon-Rodriguez et al. 2010).

Fig. 3
figure 3

Schematic representation of a peptoid library. General structure of a compound from a 250,000-membered peptoids library used by Udugamasooriya et al. (2008). The three C-terminal residues were fixed, while diversification was applied to the six N-terminal residues (drawn in blue). The main chain nitrogen is substituted with a combination of R-groups as depicted in the box (blue nitrogens represent the respective main chain nitrogen in the peptoids). Reprinted with permission from Udugamasooriya et al. (2008). Copyright © 2008 American Chemical Society

4 Automated Carbohydrate Synthesis

Glycosylation is the most abundant posttranslational modification. Many mammalian lectins have evolved to recognize these structures orchestrating key aspects of health and disease such as development, immune response, and host-pathogen recognition. Thus, the identification and characterization of glycan-binding proteins has attracted much attention. The resulting insights into glycobiology gave rise to novel pharmaceutical targets. Such targets are C-type lectins (Geijtenbeek et al. 2000; Lasky 1992), Siglecs (O’Reilly and Paulson 2009), bacterial adhesion factors (Pieters 2007), and galectins (Liu and Rabinovich 2005; Sorme et al. 2005). Moreover, viral cell entry and release are classical examples for pharmaceutical intervention utilizing carbohydrate-based drugs. The most prominent drugs are zanamivir and oseltamivir that target influenza neuraminidase (Kim et al. 1997; von Itzstein et al. 1993). Interestingly, more than 40 viruses include sialic acid recognition as part of their life cycle (Angata and Varki 2002), and many resemble suitable targets for intervention, e.g., adenovirus causing epidemic keratoconjunctivitis (Nilsson et al. 2011). Recent progress in carbohydrate chemistry has fostered many chemical biology approaches in glycobiology (Lepenies et al. 2010), leading to the identification and characterization of novel carbohydrate binding proteins (Ernst and Magnani 2009). Hence, carbohydrates as scaffolds for medicinal chemistry can be considered as “privileged structures.” Still, carbohydrate-based vaccines are the most successful examples that arose from the progress in carbohydrate chemistry (Seeberger and Werz 2005), while targeting glycan-binding proteins is still rather limited. Few drugs are available on the market or are in development (Ernst and Magnani 2009; Seeberger and Rademacher 2014). The reluctance toward using carbohydrates as a scaffold for drug design originates from a pharmacokinetic standpoint as they are readily excreted and no oral bioavailability is provided due to their high hydrophilicity. Furthermore, mammalian lectins naturally have low affinities for their substrates (Gagneux and Varki 1999; Varki 2006). Many lectins have shallow binding sites, classifying these proteins as undruggable (Hopkins and Groom 2002). However, recent examples show that advances in drug design provide grounds for reestablishing lectins as drug targets (Aretz et al. 2014; Shelke et al. 2010).

For many years, progress in the field of glycomimetics was hampered by the poor synthetic accessibility of carbohydrates compared to other biopolymers. But since the advent of automated carbohydrate synthesis, these structures have come within reach (Plante et al. 2001). Limitations such as low availability of these structures in sufficient quantities and reduced synthetic effort have been overcome (Seeberger and Werz 2005). During automated carbohydrate synthesis, the reducing end sugar is coupled to a solid support, while mono- and disaccharide building blocks are employed in coupling/decoupling cycles using sophisticated protecting group chemistry (Fig. 4). Here, similar to peptide and oligonucleotide chemistry, UV-active protecting groups are used to monitor the progress of the reaction. For the automated synthesis of carbohydrates, phosphates, thiols, and trichloroacetimidates have been proven to be useful leaving groups (Plante et al. 1999; Routenberg Love and Seeberger 2004; Seeberger 2008).

Fig. 4
figure 4

Automated carbohydrate synthesis. A coupling cycle for automated carbohydrate synthesis (top) exemplifies the key steps. Two representative building blocks are shown (bottom) (Seeberger 2008)

For vaccines and carbohydrate-based diagnostics, automation of carbohydrate synthesis has large potential (Lepenies et al. 2010; Seeberger and Werz 2005). Additional applications using choosing glycans as starting scaffolds may prove to be superior to any other biopolymer due to their intrinsic building block diversity. Moreover, glycans are branched and thereby enable higher structural complexity compared to other linear biopolymers (Laine 1994). They are stereochemically rich, and some structures resemble rigid scaffolds that have up to five hydroxyl groups for substitution.

Besides their function as a starting scaffold in combinatorial chemistry, carbohydrates can also be used to access pharmacologically highly useful structures through chemical transformation. Using a Pd-catalyzed domino reaction under microwave irradiation, Werz and coworkers prepared a library of chromans and isochromans from bromoglycal as a starting material (Leibeling et al. 2010). This study opens doors for the generation of highly functionalized scaffolds for libraries from carbohydrates in high-throughput synthesis.

5 Biology-Oriented Synthesis

Why are peptides and carbohydrates successful starting scaffolds in the quest for new biologically active molecules? Waldmann and coworkers addressed this question in more general terms (Wetzel et al. 2011), extending it to all natural products by asking: Which areas of the chemical space are enriched with potentially active compounds? Structural biology efforts from many research laboratories and structural genomics initiatives fostered fundamental insights into protein structure and function, highlighting that only a limited number of protein folds exist (Redfern et al. 2008; Worth et al. 2009). Estimates for the number of polypeptide folds range around 1,000 (Koonin et al. 2002). Therefore, the legitimate assumption is made that only a certain fraction of the chemical space is able to interact with proteins in general. Evolution might already have picked the most promising small-molecule scaffolds, namely, natural products. Hence, inspiration from the coevolution of natural products and the machinery that makes them led to the “biology-inspired synthesis” (BIOS) approach (Wetzel et al. 2011). Coming from this fundamental insight, BIOS can address the central question of high-throughput synthesis, namely, selecting the correct molecular skeleton. Diversity is then introduced by chemical transformation of the scaffold with a broad variety of substituents. Hence, major initial investment in exploratory organic synthesis of natural products is highly important for this approach (Young 2010). Therefore, the key idea is that a few naturally occurring protein folds, decorated with amino acid side chains, interact with a few naturally occurring natural product scaffolds, decorated with a diverse set of appendances. It became evident that nature is not the only source of inspiration for scaffolds potentially harboring biological activity. Existing drugs were taken into account, emphasizing the key criterion for selection being biological relevance not biological existence (Wetzel et al. 2011).

In contrast to previously described approaches, in particular peptides and peptoids, BIOS does explicitly not result in extremely large libraries. It is rather the rational selection of a few “privileged scaffolds” combined with sophisticated diversification schemes enables screening for lead structures. As many natural products are not available for synthesis, in particular for high-throughput synthesis, natural product-like structures are employed. While providing close structural similarity to natural products, these molecules are more accessible to de novo synthesis. Consequently, biologically actives are prepared from focused libraries of core skeletons from solid-phase asymmetric synthesis readily diversified by chemical transformations. The upfront investment into sophisticated chemistry is higher but is rewarded with high hit rates of synthetically approachable skeletons.

In a solid-phase total synthesis of jasplakinolide and chondramide C (Fig. 5), Waldmann and coworkers presented a series of simplified biologically active compounds for selective perturbation of actin-mediated cellular processes (Tannert et al. 2010). Forming the macrocycle in a ruthenium-catalyzed ring-closing metathesis from a linear peptide-based chain, a small focused library was prepared. It was found that the stereochemical configuration of the polyketide region significantly influenced the enantioselectivity. A cell-based assay was then implemented to identify chemical modulators of the actin cytoskeleton. This is a good example for a high upfront investment into sophisticated chemistry that finally led to high hit rates and novel compounds.

Fig. 5
figure 5

Retrosynthesis of chondramide C and jasplakinolide analogs. Reprinted with permission from Tannert et al. (2010). Copyright © 2010 American Chemical Society

6 Diversity-Oriented Synthesis

The strategic design of a structurally diverse library with minimal cost has many levels of complexity, and previously described approaches have favored a diversification of the appendages, while limited scaffold diversity was incorporated. In contrast, diversity oriented synthesis (DOS) aims to diversify at three levels: appendages, stereochemistry, and the starting scaffold. Inspiration from natural products in DOS comes from the awareness that molecular complexity correlates well with biological activity (Burke and Schreiber 2004). Here, complexity of a molecule is defined as the number and variety of rigidifying elements. In this respect, natural products or those scaffolds inspired by them are more spherical and therefore are able to span their extremities into the pockets of a potential target untouched by flat scaffolds. The scaffold then determines the molecular topology of the members of a screening library. Diversifying the scaffold can then consequently maximized spatial presentation of pharmacophores (Sauer and Schwarz 2003). At the same time, more room is provided, physically and synthetically, to optimize these structures compared flat skeletons from traditional medicinal and combinatorial chemistry.

The available protein folds and consequently the fraction of all available molecules being able to interact with this limited set of general protein scaffolds resulted in BIOS. Picking up the idea of protein folding from a rather different perspective, Schreiber and coworkers came up with another solution in their quest for structurally diverse scaffolds (Burke and Schreiber 2004). The lesson they learned from protein architecture was that folding of a polypeptide chain is determined by pre-encoded information that leads to the adoption of a specific fold. Translating this principle back into organic synthesis, so-called σ-elements in the precursor building blocks are employed, encoding for the later scaffold formation. In a clever combination with other preforming σ-elements, the library can then be directed toward the formation of distinct scaffolds. In another extreme, the same σ-elements are employed, but variation of the reaction conditions drives the generation of diverse scaffolds (Burke and Schreiber 2004). As a general principle, folding-type DOS uses ring-closing metathesis of simple building blocks to explore complex molecular scaffolds. As these approaches result from fundamental insight into diastereoselectivity of the reagents and their transformations, sophisticated organic synthesis is the key to a successful DOS route.

The final level of complexity comes with appendage diversification. This can either be employed on a diverse set of scaffolds (vide supra) or simply serve as diversification of a single scaffold as found in lead optimization procedures. In the latter case, common reagent-based routes are followed to explore the chemical space transforming densely functionalized compounds. Diversification can then be implemented either by allowing a limited repertoire of reactions to decorate a pluripotent scaffold or by altering reaction conditions on a simple skeleton (Burke and Schreiber 2004). Finally, any route taken during DOS results in a compound selection being smaller in size compared to traditional libraries, but structurally more complex in particular with respect to higher stereochemical variability. An attractive approach by Tan and coworkers using DOS in synthesis of a library of macrocycles via oxidative ring expansion highlights many aspects of this chapter (Kopp et al. 2012).

7 Target-Oriented Synthesis and Lead Optimization

While most high-throughput synthesis approaches presented thus far were predominantly focused on driving the library to maximal diversity, target-oriented synthesis (TOS) narrows down the diversity with respect to a defined target protein structure (Schreiber 2000). Thereby, TOS resembles more traditional synthesis efforts in pharmaceutical industry. There are many excellent examples that utilize a targeted approach exploring the binding site of a defined target, but at the same time implementing sophisticated high-throughput chemistry. One such example comes from Morgan and coworkers and utilizes an affinity-based screening technique. A library of 800-million DNA-encoded small molecules on the basis of a previously explored pharmacophore of Aurora A kinase and p38 MAPK was screened (Clark et al. 2009). This is the first report of DNA-encoded libraries (DEL) being utilized for identification of enzyme inhibitors (Fig. 6).

Fig. 6
figure 6

Target-oriented high-throughput synthesis of two DNA-encoded small-molecule libraries DEL-A and DEL-B. A DNA duplex headpiece is conjugated with a 4,7,10,13-tetraoxapentadecanoic acid (AOP) as a spacer, and following deprotection, the resulting amine allows further diversification. This diversification is then encoded in the DNA utilizing the non-linked two-base 3′ overhang as attachment site for a 7-base variable stretch. DEL-B incorporates an additional p38 kinase pharmacophore, the 3-amino-4-methyl-N-methoxybenzamide (AMMB) fragment. Adapted by permission from Clark et al. (2009). Copyright © 2009 Macmillan Publishers Ltd: Nature Chemical Biology

8 Conclusion

There is an emerging discrepancy between the augmented identification of potential drug targets and the lack of novel chemical entities being successful in drug design campaigns. High-throughput chemistry is a major motor to fill the gap by opening new doors to innovative libraries and lead optimization schemes. In that, a paradigm change has occurred, setting up new measures for the success of such a high-throughput synthesis. Sheer numbers of compounds as a sole measure for potential success are not applicable anymore. More than ever, innovative organic synthesis is the key feature that opens new avenues in drug discovery. Many approaches have diverged from the original thought of combinatorial chemistry or entered the field from various facets of chemistry, and none has proven to give the simple answer the field is looking for. Therefore, current strategies have to prove useful or will remain mere academic exercises.