Introduction

In recent years, advances in the field of glycomics have highlighted the intricate and subtle role of the ‘sugar code’ in nature [18]. It is now clear that oligosaccharides are crucial to the mediation of a diverse range of biological processes including fertilization [15], neuronal development [26], hormonal activities [7], tumour metastasis [8], immune surveillance [9] and inflammatory responses [1013]. When the potential for complexity in even simple oligosaccharides is considered, it is not surprising that biology utilises carbohydrates as more than just structural building blocks or sources of biochemical fuel. For example, the monosaccharide d-glucose (1) can be substituted at any of the hydroxyl groups located on carbons C2, C3, C4, C6 and at either of the two anomeric positions. Both linear and branched structures are possible. By contrast nucleotides (e.g. deoxyadenosine monophosphate 2) and amino acids (e.g. serine 3) form linear polymers with just one mode of connection in each case (Fig. 1a). There is, moreover, no shortage of monosaccharide structures for incorporation in oligomers (Fig. 1b). As a result even relatively short oligosaccharides have a far greater capacity for structural diversity than either peptides or oligonucleotides with a similar molecular weight [14, 15]. Indeed, it has been calculated that six carbohydrate monomers can yield >1012 oligomeric structures (compared to 4,096 for nucleotides and 6 × 107 for peptides) [16].

Fig. 1
figure 1

a Representative biological monomers (glucose 1, deoxyadenosine monophosphate 2 and l-serine 3), highlighting the potential for connectivity. Functional groups available for substitution are shown in bold. The carbohydrate 1 has 5 linkage points, while the others have only 2 each. b Common monosaccharide units, shown as pyranose isomers

However, the vast wealth of information that can be stored in oligosaccharides creates difficulties for glycobiological studies. Structural determination is far more challenging than for the linear peptides and nucleic acids; not only must one establish what saccharides are connected to each other, but it is necessary to know how they are connected. Synthesis presents a related problem, as methods for each type of connection must be established. There are further difficulties related to the physicochemical nature of oligosaccharides. Carbohydrate structures are dominated by hydroxyl groups, and hydroxyl groups are very similar to water. All receptors must discriminate between substrate and water, and for carbohydrate substrates this is intrinsically challenging. Protein–carbohydrate interactions, therefore, tend to be weaker than other biomolecular associations; for example, binding constants to monosaccharides are often in the range of 103–104 M−1 [17]; [note that association constants K a are used throughout this article, as opposed to the dissociation constants K d (reciprocal of K a) which are often used by biochemists]. Experimentally, these low affinities are unhelpful. There are also theoretical issues which are not fully resolved. Biomolecular recognition is driven partly by direct interactions, and partly by the hydrophobic effect. For carbohydrate substrates, the interplay between these forces is obscure and somewhat controversial [1821].

Information-carrying oligosaccharides generally occur in glycoconjugates, attached to protein or lipid anchors. They are especially prevalent on cell surfaces (Fig. 2a), where the carbohydrate layer (glycocalix) may be up to 140-nm thick [22]. The code embodied by these oligosaccharides is read by carbohydrate-binding proteins known as lectins [18, 2327] (the term is generally used for all saccharide-binding proteins apart from enzymes and antibodies). Lectin–oligosaccharide interactions are largely responsible for cell–cell recognition, and hence for many of the biological processes referred to above. The importance and ubiquity of lectins has fuelled strong interest in mimicking their action. Synthetic lectins have potential (1) as models for natural lectins, in mechanistic and other fundamental studies, (2) as complementary alternatives to natural lectins in glycobiological research, (3) as diagnostic tools in medicine, and (4) as pharmaceuticals, based on the disruption of natural carbohydrate recognition. A particular medium-term objective is the monitoring of glucose levels in diabetics [2833]. Although current enzyme-based methods are inexpensive and effective, they are not well-adapted to continuous operation over long periods (as required, for example, in an “artificial pancreas”). Coupled to a system for transduction of binding into a readable signal, a glucose-selective synthetic lectin could provide a practical solution to this problem.

Fig. 2
figure 2

a Carbohydrate presentation at cell surfaces for cell–cell recognition (reprinted with permission from [27]). b A glucose molecule in the active site of the E. coli. galactose chemoreceptor protein, as revealed by X-ray crystallography. The substrate makes contact with two apolar residues (phenylalanine and tryptophan, shown in light green), a water molecule (dark blue), and eight polar amino acid residues (3× aspartate, red; 3× asparagine, pink; 1× histidine, yellow; and 1× arginine, light blue)

The purpose of this article is to provide an overview of the progress that has been made in artificial carbohydrate recognition, illustrating how rational molecular design principles have been coupled with lessons from nature in order to realise functional synthetic lectins. There has been much research on this topic, including extensive work on binding carbohydrates in organic solvents and also on the use of reversible covalent bond formation in water (specifically, boron–oxygen bonds [3436]). However, the focus here will be on truly biomimetic receptors, defined as systems which operate successfully in water using non-covalent intermolecular interactions.

Synthetic lectin design principles

A distinctive feature of the lectin family of carbohydrate binding proteins is that they complex saccharides with relatively high specificity but display no catalytic activity. As a result, the substrate binding cleft is preorganised for recognition of a ground-state carbohydrate molecule rather than a reaction transition state (as in the active site of an enzyme). An example of a carbohydrate encapsulated within a protein binding site, that of the Escherichia coli galactose chemoreceptor protein, is shown in Fig. 2b [37, 38]. In this case, a single glucose molecule is held through a combination of 13 hydrogen bonds (H-bonds) and close contact between two apolar residues (phenylalanine and tryptophan) that effectively “sandwich” the monosaccharide via non-polar interactions. Hydrogen bonds are much stronger than the van der Waals interactions between non-polar surfaces, so, at first glance, it might appear that the preorganised polar groups are the most important features of this system. However, in aqueous media, the driving force for sugar recognition is far from clear. It is reasonable to assume that in water the lectin’s carbohydrate binding cavity is likely to be fully solvated in the absence of substrate. Binding therefore involves the replacement of, for example, NH···OH2 with NH···OHR. In the general case, the net energy change accompanying this process should be very small. One might therefore conclude that the energetic impetus for carbohydrate recognition in lectins should be dominated by hydrophobic interactions, i.e. the displacement of high energy water from the binding site on complex formation [21]. According to traditional views of the hydrophobic effect, binding should in this case be entropy driven, and it turns out that enthalpy in fact dominates [17]. However, recently, evidence has come to light for an enthalpically driven ‘non-classical’ hydrophobic effect and this may play a role in lectin–saccharide binding events [39, 40]. Moreover, a variety of model studies have indicated roles for CH–π [4145] and other apolar interactions [46, 47] in carbohydrate recognition. Having said all this, the pattern of polar contacts is surely also important, and it is safest to assume that polar and apolar interactions work in concert to achieve biological carbohydrate recognition.

Supramolecular chemists attempting to emulate lectin–saccharide complexation are faced with a considerable challenge. If both polar and apolar interactions are important, then both types of unit must be incorporated, properly positioned, in receptor designs. Furthermore, saccharides are relatively large substrates, and a receptor should span or (ideally) enclose its target. Indeed, artificial carbohydrate receptors tend to be amongst the largest biomimetic constructs assembled by host–guest chemists. Conformational control is particularly important, partly to maintain the appropriate cavity shape and partly to prevent contact between self complementary H-bonding groups (both donors and acceptors are likely to be necessary). Requirements for biomimetic carbohydrate receptors are thus (1) sufficient size to fully encapsulate the substrate, (2) an array of both polar and apolar functional groups that match the surface potential of a carbohydrate, and (3) sufficient rigidity to prevent intramolecular recognition or self-association. These criteria must not just be met, but be met well. Carbohydrates are perfectly “happy” in water, and need strong persuasion to enter a binding site (as evidenced by the low affinities of lectins for saccharides—see earlier). A final point is that, to operate in water, a biomimetic carbohydrate receptor must of course be water soluble. Again, this requirement should be met well. For the full characterisation of binding properties, it is necessary to employ a range of techniques of which NMR is the most reliable and informative. Awkwardly, NMR requires fairly high concentrations (ideally ~1 mM) and also that the receptor should not form aggregates in solution (in which case, slow tumbling leads to broad spectra). As illustrated later, this is not so readily achieved and presents a non-trivial challenge for practitioners in the area.

Given these difficulties, there are some advantages to investigating carbohydrate recognition in less competitive media. A carbohydrate in a non-polar solvent is intrinsically quite easy to bind through polar interactions such as H-bonding. Architectural ideas based on these forces can thus be tested without facing the full challenge of binding in water. Managing solubility in organic solvents is also easier. Moreover, carbohydrate recognition in organic media can be biomimetic in the sense of mimicking recognition at the membrane–cytosol interface, especially if extraction from water can be demonstrated. Accordingly, there has been considerable work in this area over the past 20 years, as recorded in a number of reviews [24, 36, 48, 49] and recent articles [5059]. For reasons of space, we will not attempt a further discussion of this work but will proceed to our major topic of carbohydrate recognition in water.

Investigating carbohydrate recognition in water

In contrast to the success achieved by many groups in organic solvents, progress in aqueous media has been slow. There are still relatively few examples of synthetic receptors, operating through non-covalent interactions, that have been clearly proven to bind carbohydrates in water. In addition to meeting the demanding structural criteria described in the previous section, complete and reliable characterisation of binding phenomena can be more difficult in water. The investigation of host–guest interactions in organic solvents is often carried out using 1H NMR spectroscopy, as spectra are easily interpreted and binding constants readily calculated from signal movements. Intermolecular H-bonding can be directly observed through downfield shifts in proton resonances, and shielding/deshielding effects from aromatic units can also give clear evidence of recognition. Intermolecular nuclear Overhauser effects (nOes) can also yield proof of close contact between host and guest. This wealth of information can be integrated effectively to put together a convincing case for recognition between substrate and host. In favourable cases, multipoint intermolecular nOes can be used to construct a three-dimensional model of the complex. The only real disadvantage to using NMR spectroscopy to determine binding constants is that solutions cannot be very dilute, so that extremely strong binding constants cannot be accurately calculated. However, intermolecular interactions and host–guest stoichiometries can still be inferred in strong complexes and verified using alternative techniques such as fluorescence spectroscopy and isothermal titration calorimetry (ITC).

Despite all the advantages of NMR spectroscopy as a tool for investigating carbohydrate recognition, it is less frequently utilised for studies in water. This may be due to the fact that many of the organic constructs designed to bind sugars simply are not soluble enough to be investigated by 1H NMR spectroscopy. Additional issues include the solvation of H-bonding groups inside carbohydrate receptors and exchange of more acidic protons with the solvent medium. Both of these effects can lead to misleading results or even preclude any observation of substrate recognition. As a result, many studies rely heavily on calorimetry or UV-visible spectroscopy to demonstrate carbohydrate complexation. Unfortunately, these techniques usually give little direct information about the structural interactions between host and guest. As a consequence, results generated in this way are open to misinterpretation. In short, bearing in mind the considerable difficulties associated with saccharide binding in water, it is prudent to verify any association constants measured using more than one experimental method.

Calixarenes and other oligoaromatic hosts

Calixarenes are bowl-shaped cyclic aromatic oligomers that feature curved internal cavities well suited to the encapsulation of small molecules or ions. Their potential utility as selective hosts was recognised early on by supramolecular chemists, and formative work in carbohydrate recognition was focused on these molecules [60]. In 1992, the Aoyama group described variants of these receptors that could operate in aqueous conditions [61]. In this study, the anionic calixarenes 18ac (Fig. 3) were shown to bind to a few less hydrophilic saccharides such as fucose 11, although no recognition of the more common hexoses (e.g. glucose 1) was observed. Association constants were low (Table 1), but it was demonstrated that increasing the electron density available to the aromatic system through either structural modification (18ac) [62] or deprotonation (18a – 2H+/4H+) [63] considerably improved saccharide binding, most likely through enhanced CH–π stacking interactions.

Fig. 3
figure 3

Water soluble calixarenes 18, related oligoaromatic hosts 19 and 20, and heptaose substrate 21

Table 1 Selected association constants (K a, m −1) for binding of monosaccharides to water-soluble calixarenes 18ac

A subsequent study by Král and co-workers employed binaphthyl-substituted calixarene 18d as a saccharide host in water-methanol 99:1 [64]. In this case, the association between host and guest was inferred using a competitive binding experiment where 18d was combined with methyl red and titrated against selected carbohydrates. Significant changes in the UV-visible methyl red absorption spectra were used to calculate remarkably high binding constants (e.g. K a = 1,100 m −1 for glucose), although confirmation was not obtained by any other technique. A related system, chromotropylene 19, wherein the cavity size of the calixarene is increased was shown to bind only the more hydrophobic methyl β-d-glycosides, with K a up to 75 m −1 [65]. Recently, the oligoresorcinol 9mer 20 was shown to recognise oligosaccharides in water via duplex formation [66]. In this experiment, circular dichroism (CD) silent homo-double helices of 20 were shown to produce optically active complexes with R-1,6-d-isomaltooligosaccharides. The duplex displayed selectivity for R-1,6-d-isomaltoheptaose (21) over a selection of other heptameric saccharides. Additional CD and ultraviolet-visible (UV-Vis) titration studies revealed the sugar is bound in a stoichiometry of (20:21 = 1:2).

Cyclodextrins

Cyclodextrins are a family of cyclic oligosaccharides composed of either six (22), seven (23a) or eight (24) α-(1-4)-linked d-glucopyranose units (α, β and γ cyclodextrins; Fig. 4). As a result of this symmetrical cone-shaped arrangement, the internal cavity of a cyclodextrin is essentially hydrophobic in nature whilst the outer rim of the molecule is functionalised with an array of water-solubilising hydroxyl groups. These desirable features combined with their ready accessibility make cyclodextrins amongst the most widely studied water soluble host molecules [67, 68]. This extensive research programme has included exploratory investigations of cyclodextrins as hosts for saccharides, although studies have been hampered by the structural similarity between the substrates and the hosts. Several groups have found that β cyclodextrin 23 can selectively recognise pentoses over hexoses using fluorometric competition experiments [69, 70] and microcalorimetry [71] (Table 2). Unfortunately, the results obtained by the different methods are not especially self-consistent. Microcalorimetry has also been applied to the smaller α cyclodextrin 22 which, surprisingly, seemed to bind hexoses as well as pentoses (Table 3). In this case, the ΔH values were determined to be close to zero using ITC. Whilst these results imply that substrate recognition is entropy driven (via expulsion of water from the host), it is also clear that the experiments were undertaken at the threshold of sensitivity for the technique. Confirmation of the binding constants using a second method would therefore be especially useful. In related work, Schneider and coworkers found that the interaction between modified cyclodextrin 23b and d-ribose 4 could be followed by 1H NMR spectroscopy, and calculated a binding constant of 26 M−1 [72].

Fig. 4
figure 4

Cyclodextrin (CD) structures

Table 2 Association constants (K a, m −1) for binding of pentoses to β-cyclodextrin 23a
Table 3 Association constants (K a) and thermodynamic parameters, determined by microcalorimetry, for binding of monosaccharides to α-cyclodextrin 22a

Water soluble porphyrins

Like cyclodextrins, porphyrins have proved to be versatile building blocks in supramolecular chemistry. They are readily accessible and easily functionalised, while possessing large, rigid planar aromatic surface and precisely located high-affinity binding sites for metal ions. Some of these features have been exploited by Král and coworkers [64, 7379] to create planar (25ad) and macrocyclic (26a) porphyrin-based carbohydrate receptors (Fig. 5) that can be studied in aqueous solvent mixtures such as H2O–MeOH, 95:5. Under these polar conditions, the recognition of selected monosaccharides was investigated using UV-visible spectroscopy. Analysis of the data gave the results summarised in Table 4. The figures suggest that 25b in particular might be a promising component for optically responsive carbohydrate-sensing devices. However, the high binding constants are puzzling, given the size mis-match between 25b (~24 Å diameter) and a monosaccharide (~8 Å), and this system could probably benefit from further study by complementary techniques. A porphyrin–bile acid conjugate has also been prepared, and has shown promise as a reagent for carbohydrate-selective cell surface labelling [80].

Fig. 5
figure 5

Examples of the linear (25, 26b) and macrocyclic (26a) porphyrins investigated by Král and coworkers as potential carbohydrate receptors

Fig. 6
figure 6

Carbohydrate substrates for recognition studies, as discussed in the text

Table 4 Association constants (K a, m −1) for binding of carbohydrates to porphyrin receptors in aqueous solvent systems

Metal complexes

The integration of metal ions (Mn+) within supramolecular recognition systems is a practical way of improving substrate binding and specificity, as coordination bonds are commonly observed to be both strong and highly dependant on the orientation of the ligand donor relative to the metal centre. This binding motif is employed by nature for carbohydrate recognition in “C-type” lectins, where the metal ion is Ca2+ [81]. Synthetic receptors which exploit this approach have been developed by Striegler and coworkers [8284]. Instead of Ca2+, this group have used Cu2+ ions, which can be strongly bound (and therefore reliably positioned) by nitrogen-based ligands. Mononuclear complex 44 was shown to bind simple carbohydrates with a binding constant of ~5,000 M−1 at pH 12.4, and closely related dinuclear species 45 showed similar affinities (Fig. 7a, b). The observations that firstly the system required basic conditions, and secondly that methyl glycosides did not form complexes, are strong evidence for the deprotonation and ligation of the anomeric hydroxyl group. The bis-copper(II) system was shown to be selective for mannose (10) over glucose (1). This preference was attributed to the convergent orientation of the hydroxyl groups in mannose allowing for more ready chelation of the Cu2+ ion. Complex 45 has also been presented as a binding agent for disaccharides [85], and the functionality of these systems has been investigated for incorporation into molecularly imprinted polymers [86]. Lanthanide ions have also been employed to mimic the role of Ca2+ in C-type lectins. Strongin and co-workers prepared the complexes 46 and found that their fluorescence output changed substantially on addition of carbohydrates. The Eu3+ complex 46b was especially sensitive to sialylated oligosaccharides such as gangliosides GM1 and GD1a/b [87].

Fig. 7
figure 7

a Mononuclear Cu2+ complex 44 coordinated to the anomeric hydroxyl group of a monosaccharide; b dinuclear Cu2+ complex 45 bound to mannose via multiple copper–oxygen bonds; c salophene lanthanide complexes 46ab

Aromatic-centred podands

The podand architecture, in which variable “legs” are mounted on a central scaffold, is convenient and versatile for host design. In the case of carbohydrate recognition, an aromatic scaffold has clear advantages. The aromatic surface can assist binding though CH–π interactions, and there is a good size match between a saccharide residue and a benzene ring. The group of Mazik have exploited this strategy extensively [49]. Although most of their systems have been designed to operate in organic solvents, the dicarboxylate 47 (Fig. 8) has been studied in water [88]. NMR studies showed that 47 binds both methyl β-d-glucoside 27 and cellobiose 29. Affinity measurements were complicated by multiple stoichiometries, but apparent 1:1 K a values were 2 and 305 m −1, respectively. The bis-arginine–anthracene conjugate 48 was studied by Nilsson as a receptor for sialylated oligosaccharides [89]. It was found by NMR to show significant affinities (~100 m −1) for models of GM3 and the blood group antigen sialyl Lewisx. The observation that both 47 and 48 bound larger oligomeric substrates relatively well is interesting but not surprising. The larger substrates present extended surface areas and more functional groups with potential for non-covalent interactions. Even if the receptor can bind to just part of the substrate at one time, the possibility of multiple binding geometries can increase affinities.

Fig. 8
figure 8

Podand carbohydrate receptors with aromatic central scaffolds

Peptide-based designs

Given that lectins themselves are peptides, it makes sense to consider peptidic structures for synthetic carbohydrate receptors. Indeed, a number of biochemical groups have reported on medium-length carbohydrate-binding peptides, discovered by studying fragments of lectins or by selection from phage-displayed combinatorial libraries [90]. A particular target has been the Thomson–Friedenrich carcinoma antigen (Galβ1,3GalNAcα1,R) [9093]. Though interesting and successful, this work falls outside the scope of the present article as it lacks the element of ab initio design. More relevant is the simple dipeptide Trp–Trp 49 (Fig. 9) investigated by the Aoyama group [94] and inspired by the sandwiching of saccharides between aromatic surfaces in some lectins (e.g. Fig. 2b). Addition of maltotriose 31 perturbed the fluorescence output of 49, implying a binding constant of 8 m −1. The smaller substrate maltose 30 caused similar effects but appeared to bind less strongly (K a ~ 1 m −1). A more elaborate peptidic design 50 was described by the group of Meldal [95]. The bicyclic structure incorporates a macrocyclic dodecapeptide with a naphthyl bridge, providing an amphiphilic cavity not unlike that of a natural lectin. Binding to cellobiose 29 could be studied by NMR, although 2D methods were required to resolve the spectra. The binding constant for 50 + 29 was estimated at 8 m −1.

Fig. 9
figure 9

Carbohydrate receptors which employ peptidic frameworks

Tri- and tetra-cyclic cages: the “temple” architecture for synthetic lectins

Whilst most of the receptors discussed above can span a saccharide, making contact with different parts of the substrate, there are few (if any) which can fully enclose their target. In this section, we discuss a family of molecules from our own group which can surround carbohydrate substrates in all three dimensions. The prototype was the tricyclic octa-amide 51a (Fig. 10a) [96]. Unlike most previous systems, receptor 51a was specifically designed to recognise a narrow range of carbohydrates, those with all-equatorial arrays of polar functionality. This group includes β-glucosides such as 27, glucose 1 itself (in β-pyranose form), and close relatives such as xylose 6, 2-deoxyglucose 33, and N-acetylglucosamine 15 (as β-glycosides and β anomers). The design concept is illustrated in Fig. 10b. The all-equatorial substrates possess two patches of hydrophobic CH groups on their upper and lower surfaces, and polar groups radiating from the centre, placed close to the average plane of the six-membered ring. As shown, a complementary cavity may be constructed from two parallel apolar surfaces held apart by spacers containing polar functional groups. In this representation, the architecture is reminiscent of a classical temple, hence the name given to this family of host molecules. In 51a, the “roof” and “floor” of the temple are realised as biphenyl units, while the “pillars” are isophthalamides. Importantly, modelling showed that the aromatic amide spacers are sufficiently rigid to prevent the apolar surfaces meeting each other. In aqueous solution, therefore, hydrophobically driven collapse of the cavity should not take place.

Fig. 10
figure 10

“Temple” receptors for all-equatorial carbohydrates. a Monosaccharide receptors 51ac. Apolar surfaces are represented in blue, polar spacer groups in red, and solubilising moieties in green. b Cartoon representing the design concept. Non-covalent interactions are shown as broken lines (red for hydrogen bonds, blue for hydrophobic/CH–π interactions). c NMR structure for the complex between 51c and GlcNAcβ-OMe 32. Isophthalamide spacer atoms are coloured according to element (C black, H white, O red, N blue), and the biphenyl units are highlighted in cyan. Carbohydrate 32 is shown as pink, except the NHAc unit which is highlighted in yellow. Intramolecular and intermolecular nOe contacts are shown as green and red broken lines, respectively. The water-solubilising tricarboxylate groups are omitted. d Disaccharide receptors 52a, b, with intended substrate cellobiose 29

Although the cavity of 51a was designed to operate in water, the molecule itself was not suitable for that purpose. The externally directed pentyl ester groups were chosen to promote organic solubility, for preliminary studies in non-polar solvents such as chloroform. Whilst the results were encouraging [96], and a related system proved able to extract carbohydrates from water [97], there remained the problem of engineering water solubility. Hydrolysis of the pentyl esters gave tetracarboxylate 51b but, perhaps surprisingly, this highly polar molecule did not prove useful. It dispersed freely in water but gave broadened NMR spectra (perhaps due to aggregation) which could not be employed in binding studies. It was only when a tricarboxylate solubilising group was employed, in 51c, that investigations in water became possible.

Receptor 51c was studied in two stages. Initially, it was tested against a panel of 15 carbohydrates containing only oxygen-based substituents [98]. 1H NMR titration was the main technique used, but in several cases confirmatory data were provided by fluorescence titration. The receptor showed quite low affinities (e.g. glucose 1, 9 m −1; methyl β-d-glucoside 27, 28 m −1), but encouraging selectivity for the intended all-equatorial targets (e.g. glucose:galactose, ~4.5:1). It was then realised that β-N-acetylglucosaminyl (β-GlcNAc, as in 15 and 32) might also be a good target, so a number of N-acetylaminosugars were added to the list of substrates [99]. The full set of binding constants, shown in Table 5, tells a remarkable story. β-GlcNAc is indeed a good substrate for 51c, far more so than β-glucosyl. Indeed, when considered as a β-GlcNAc receptor, 51c bears comparison with natural lectins. Table 5 includes some association constants to wheat germ agglutinin (WGA), a lectin which has classically been used to bind GlcNAc units. For methyl glycoside 32, the archetypal β-GlcNAc substrate, the synthetic and natural systems show quite similar affinities (630 and 730 m −1, respectively). For other substrates, where comparisons are possible, WGA shows much higher affinities than 51c. These data therefore imply that 51c is considerably more selective than its natural competitor.

Table 5 Association constants (K a) for binding of carbohydrates in water to tricyclic octa-amide receptor 51c, in order of descending affinity

At first sight, the binding of 51c to N-acetylglucosamine 15 is surprisingly weak (56 m −1), but this reflects the fact that only the (minor) β anomer of 15 is bound. It was possible to determine this because dissociation is slow on the NMR timescale, allowing the observation of a well-resolved spectrum of the complex. A similar phenomenon was observed for 51c.32. In this case, it was possible to obtain a detailed NMR structure of the complex (Fig. 10c). The substrate is sandwiched between the biphenyl surfaces as originally envisaged, making CH–π contacts and several H-bonds. The NHAc methyl group rests between spacers in a narrow portal of the cavity, presumably benefiting from further CH–π interactions.

β-GlcNAc on serine or threonine (“O-GlcNAc”) is a common post-translational modification of proteins, thought to have important regulatory effects [100102]. Glycopeptide 53 was prepared as a model of this unit and tested as a substrate for 51c [99]. Encouragingly, 53 was bound with K a = 1040 m −1, slightly more strongly than methyl analogue 32. Asparagine derivative 54, modelling “N-linked” β-GlcNAc (another common motif), was also tested. Surprisingly, this was bound very weakly (K a ~ 4 m −1), so it seems that 51c is selective for β-O-linked GlcNAc. On the other hand, N,N′-diacetylchitobiose 37 was a very poor substrate (see Table 5) implying that the O-linked group must be fairly slender and, possibly, cannot be another saccharide unit. Further work is required, but 51c shows real promise as a specific receptor for the O-GlcNAc protein modification, with minimal cross-reactivity to other saccharide moieties.

Most biological carbohydrate recognition involves oligosaccharides, so these larger substrates are also interesting targets for biomimetic systems. The “extended temple” 52a was designed according to the same principles as 51c, but with all-equatorial disaccharides (e.g. cellobiose 29) as intended substrates [103]. To make room for the substrate, the biphenyl components were replaced by terphenyls, while rigidity was enforced by a fifth isophthalamide spacer. Molecular modelling confirmed that open conformations were indeed strongly favoured. Following assembly via sequential high dilution macrolactamisations, the binding properties of 52a were investigated using 1H NMR spectroscopy, ICD and fluorescence spectroscopy. The measured association constants are shown in Table 6. At least two techniques were used for each substrate, and agreement was generally good, so these values are exceptionally secure. Once again, the targeted all-equatorial substrates were bound with good affinities and excellent selectivities. The K a values for cellobiose 29, methyl β-d-cellobioside 38 and xylobiose 39 were ~600, ~900 and ~260 m −1, respectively. N,N-Diacetylchitobiose 37 was also bound quite well although in this case (unlike 51c) the NHAc groups seemed to lower affinity. Other substrates were poorly bound, mostly with K a = 10–15 m −1. Selectivity for cellobiose 29 versus non-targeted disaccharides was ~50:1. Notably, this held true even for lactose 34, which differs from cellobiose at just one stereocenter.

Table 6 Association constants (K a, m −1) for binding of carbohydrates in water to “extended temple” 52a, as measured by 1H NMR, ICD and fluorescence titrations

The complex between 52a and cellobiose 29 showed slow dissociation on the NMR timescale and, like 51c.32, was therefore directly observable by this technique. Although the spectrum could not be fully assigned, nuclear Overhauser effect spectroscopy (NOESY) yielded some unambiguous contacts. From these, it was possible to show that 52a.29 possessed the expected structure in which the disaccharide is sandwiched between the terphenyl units. The complex was also studied by ITC. This technique provided a fourth independent measurement of the binding constant (650 m −1), and also gave insight into the thermodynamics of binding. It was found that complexation was mainly enthalpy-driven (ΔH = −3.22 kcal mol−1) with a minor contribution from entropy (TΔS = 0.62 kcal mol−1). This balance lies well within the range observed for lectins [17].

The affinities and selectivities of 51c and 52a, their mode of action, and the above thermodynamic data, suggest that these temple receptors can serve as quite realistic lectin mimics. Structurally, they are quite different from lectins, but this confers significant advantages. Their polycyclic frameworks allow them to maintain their binding conformations under a wide range of conditions, unlike lectins which (as proteins) are prone to denature. Moreover, their structures can be altered in ways which protein chemistry cannot match. In particular, their externally directed groups may be adjusted to confer solubility in almost any medium. It is thus possible to study their binding properties in solvents as diverse as chloroform and water. There is a strong motivation for doing so, because the role of solvent in natural carbohydrate recognition has been mysterious and controversial. One viewpoint considers the receptor–carbohydrate interaction as essentially polar in nature, driven by exceptionally favourable hydrogen bonding patterns [19]. Alternatively, it has been proposed that the amphiphilic binding sites of carbohydrate receptors may not be well hydrated, despite containing many polar groups. In this case, the displacement of high-energy water molecules (the hydrophobic effect) could be a major driving force for binding [21, 104]. One way of addressing this issue is to study the dependence of affinity on solvent. If the binding is exclusively polar in nature, then water will be the most competitive solvent and binding will be stronger in all other media. As solvent polarity increases, binding constants will decrease monotonically. On the other hand, if solvophobic effects are important, water may not be the least favourable medium. A polar organic solvent such as methanol would suppress H-bonding effectively but, lacking water’s cohesive properties, would not provide a driving force for binding. On moving up the polarity scale from a non-polar organic solvent (e.g. chloroform) to water, the K a values would pass through a minimum.

With the availability of receptors 51 and 52, this approach could be reduced to practice [105]. Both had been prepared in water soluble form (51c and 52a) and in versions soluble in chloroform (51a and 52b, the latter being the immediate precursor of 52a). For studies in non-polar solvents, 51a and 52b could be paired with organic-soluble glycosides 55 and 56, respectively (Figs. 11, 12). In one series of experiments, K a values were measured for 52b + 56 in a full range of methanol–chloroform mixtures, and for 52a + cellobiose 29 in methanol–water mixtures. The results are shown in Fig. 13. As expected, addition of methanol lowered affinities in the non-polar medium where H-bonding is dominant (Fig. 13b). However, less predictably, methanol also reduced binding constants in water (Fig. 13a). Addition of acetonitrile or DMSO to water produced even stronger effects; for acetonitrile, just 8% added to the aqueous solution caused a 47-fold drop in affinity. It was possible to conclude with some certainty that hydrophobic effects play an important role in carbohydrate recognition in water.

Fig. 11
figure 11

β-GlcNAcylated peptidic substrates studied with 51c

Fig. 12
figure 12

Organic-soluble glycosides for studies with 51a and 52b, in non-polar media

Fig. 13
figure 13

Binding constants K a of disaccharide receptors 52a/b to cellobiosyl units in a series of water–methanol and methanol–chloroform solvent mixtures. As the spectrum of solvent polarity is traversed from water to chloroform, affinities pass through a minimum. a 52a + d-cellobiose 29 in H2O/MeOH. b 52b + octyl β-d-cellobioside 56 in MeOH/CHCl3. In the latter case, K a is expressed on a logarithmic scale so that the full range of values can be represented

Conclusions

Biomimetic carbohydrate recognition has proved a challenging task. In two decades of research, a variety of receptors have been developed for binding saccharides in organic solvents, and these continue to multiply. However, systems which operate in water, and are therefore truly biomimetic, are still very rare. Even where binding is achieved, affinities mostly remain low. Indeed, a well-characterised binding constant of 10 m −1 is probably still a significant achievement. Nonetheless, recent developments give cause for optimism. The temple receptors have raised affinities close to 1,000 m −1 for some substrates, and show very good selectivities. Their binding constants are still low by general biological standards, but then so are those of many lectins. In fact, the temples come remarkably close to matching their biological competitors, and may reasonably be described as “synthetic lectins” [22, 106]. For biomimetic chemists, these are encouraging results, suggesting that reproducing the functionality of biological macromolecules is not a hopeless quest. Moreover, the temples offer genuine potential for applications, especially in studying the O-GcNAc protein modification.

Of course, major problems still remain. Higher affinities are certainly desirable, and should be possible. Although many lectins bind weakly, some show affinities of 106–107 m −1 [38]. This level may be difficult to reach but, by adjusting the temple design, increases of one or two orders of magnitude should be feasible. More intractable, perhaps, is the targeting of the many carbohydrate units which are not “all-equatorial”, and are thus not bound by the temple receptors. These cases will require different arrangements of hydrophobic surfaces, probably with lower symmetry. A general solution might be found in combinatorial methodology, but equally, there may be no alternative to specific design, synthesis and testing for each substrate. Either way, the area provides scope for many more years of instructive and fulfilling research.