Keywords

1 Introduction

Carbohydrates, most often linked to proteins or lipids, cover the surface of all living cells and are fundamental determinants of health and disease. Carbohydrate-binding proteins or lectins interact with glycans in specific ways, eliciting many important cellular responses including those involved in immune cell homeostasis. The Ca2+-dependent or C-type lectins are the largest and most diverse lectin family found in animals (Zelensky and Gready 2005; Drickamer and Taylor 2015). Well-studied members include galectin and siglec in mammals. C-type lectins share highly homologous structural modules in their carbohydrate recognition domains. These domains are ubiquitously found both in soluble and membrane proteins. More than 100 human proteins contain at least one C-type lectin domain. The C-type lectin family is subdivided into 16 groups (Groups 1–16) based on phylogenetic relationships and domain architecture (Cummings and McEver 2015) (Fig. 1a). Most of these groups have a single C-type lectin domain. Exceptions with 8–10 C-type lectin domains in their polypeptides (e.g. macrophage mannose receptor and DEC-205) are found in Group 6. Also, many proteins have C-type lectin domains that lack the conserved Ca2+ binding site, designated as “C-type lectin-like domains,” and thus do not always bind carbohydrate ligands (e.g., Group 5 in Fig. 1a). This has allowed for distinction between sugar-binding C-type lectins and the broader family of C-type lectin-like domains (Zelensky and Gready 2005).

Fig. 1
figure 1

a Domain architecture of representative C-type lectin-containing proteins described in this issue. This figure is prepared from the review (Zelensky and Gready 2005) with modifications. C-type lectins are classified into three sugar-binding motifs (EPN, QPD, and calcium independent). Fn type 2: Fibronectin type 2, SCR: Short consensus repeat, EGF: Epidermal Growth Factor, ASGPR: Asialoglycoprotein receptor, SRACLA: scavenger receptor C-type lectin. b Overall structure of C-type lectin domain (Rat mannose binding protein A (MBP-A), PDB code 2MSB). Polypeptide, carbohydrate and calcium ions are shown in ribbon, stick, and sphere models, respectively. The long loop region is colored in cyan. Disulfide bonds are shown in yellow stick model. c Close-up view of the primary binding site of MBP-A. The coordination and hydrogen bonds are depicted with red dotted lines

C-type lectins accept a wide variety of carbohydrate and non-carbohydrate ligands such as lipids and proteins via their lectin or lectin-like domains. Some C-type lectins play specific roles in glycoconjugate recognition, with the aglycon moiety of the ligand often contributing to the interaction. From an immunological point of view, various C-type lectins work as pattern recognition receptors (PRRs) which recognize highly conserved specific molecular signatures called pathogen-associated molecular patterns (PAMPs) and damage-associated molecular patterns (DAMPs) that are crucial for discrimination of self from non-self (Varki 2017; Sancho and Reis e Sousa 2013). In order to understand the physiological roles of C-type lectins in detail, 3D structural information is essential. We describe our current knowledge of the carbohydrate recognition mechanism of C-type lectins at monosaccharide, oligosaccharide, and polysaccharide levels.

2 C-Type Lectin Fold

Since the pioneering work on the mannose-binding protein was reported (Weis et al. 1992), hundreds of atomic structures have been elucidated for C-type lectins. As of July 2019, over 250 atomic structures of C-type lectins were deposited in the Protein Data Bank. Overall fold and disulfide bond pattern are highly conserved among C-type lectins and C-type lectin-like domains. The C-type lectin domain is typically composed of 110–130 amino acid residues and the overall fold is formed by two α-helices and six or seven β-strands forming two antiparallel β-sheets (Fig. 1b). A long loop is found around the “top face” and inserted between two β-strands (β2 and β3 in typical cases). This loop is characteristic of C-type lectins, playing crucial roles in calcium coordination and sugar recognition (colored in cyan in Fig. 1b). C-type lectin domains accept one to four calcium ions (Zelensky and Gready 2005). The sugar-binding calcium ion is located at the top face of the C-type lectin domain. In the sugar-binding site, the calcium ion makes coordination bonds with both the lectin domain and bound monosaccharide (Fig. 1c). The other calcium ions mainly stabilize the 3D structure and occasionally form a secondary sugar-binding site.

3 Sugar Binding Motifs: EPN and QPD

A calcium ion forms multiple coordination bonds with amino acid residues that are well conserved among C-type lectins. The residues that define the sugar binding are called motifs. The asparagine and aspartate from the WND (Trp-Asn-Asp) motif and one carbonyl side chain (Glu side chain in Fig. 1c) form coordination bonds with Ca2+ in all C-type lectins. In addition, the EPN (Glu-Pro-Asn) or QPD (Gln-Pro-Asp) motif contributes to calcium coordination and forms the sugar-binding site. These motifs have two carbonyl groups separated by a proline residue. Two adjacent hydroxyl groups from a monosaccharide make coordination bonds with the calcium ion and hydrogen bonds with the EPN or QPD motif (Zelensky and Gready 2005). C-type lectin-like domains lack these conserved motifs and do not bind a calcium ion.

EPN and QPD motifs define the monosaccharide specificity of C-type lectins. Hence, C-type lectins can be classified into three groups: (i) EPN motif-containing C-type lectins, (ii) QPD motif-containing C-type lectins, and (iii) C-type lectin-like domains without these two motifs. EPN motif-containing C-type lectins usually accept d-mannose, d-glucose, l-fucose, and N-acetyl-d-glucosamine (GlcNAc) through equatorial 3-OH and 4-OH groups (left panel in Fig. 2a). In contrast, QPD motif-containing C-type lectins bind N-acetyl-d-galactosamine (GalNAc) and d-galactose through equatorial 3-OH and axial 4-OH groups (right panel in Fig. 2a). In both cases, bound monosaccharide is stabilized through hydrogen bonds with the carboxyl (–COO) and amide (–CONH) groups from these motifs. Importantly, hydrogen donors and acceptors are switched in the two cases. In this way, the hydrogen bond network defines the position and orientation of the bound monosaccharide. In fact, replacing the EPN motif with QPD in mannose binding protein A (MBP-A) changes the binding ability to favor galactose. This result proves the role of these motifs in monosaccharide specificity (Drickamer 1992). In a ligand-free form, two water molecules occupy the positions which will be taken over by 3-OH and 4-OH groups of the binding sugar residue, forming an eight-coordinated calcium ion (Ng et al. 1996; Feinberg et al. 2000). The orientation of the sugar ring at the primary binding site is affected by the surrounding amino acid residues. Occasionally, the bound sugar occurs as a mixture of the two orientations in a single crystal structure (Ng et al. 2002).

Fig. 2
figure 2

a Monosaccharide recognition by EPN (MBP-A, PDB code 2MSB, left panel) and QPD motifs (ASGPR, PDB code 5JPV, right panel). The amino acid residues which contribute to calcium ion coordination and sugar binding are shown in stick models. The positions of EPN and QPD motifs are indicated. b Galactose recognition by TC14 (PDB code 1TLG, left panel) and ASGPR (PDB code 5JPV, right panel). The tryptophan residues (W100 in TC14 and W243 in ASGPR) are shown in stick model and make stacking interactions with galactose rings. c Sulfated galactose recognition by langerin (PDB code 3P5I). The amino acid residues interacting with the sulfate group are highlighted. d Sialic acid recognition by murine SIGN-R1 (PDB code 4CAJ). Coordination and hydrogen bonds are depicted with red dotted lines. An additional hydrogen bond with asparagine (N288) is also shown. e Heparin disaccharide recognition by EMBP (PDB code 2BRS). One disaccharide unit (colored in white) is surrounded by three EMBP molecules (shown in semitransparent surface models, and colored in green, cyan, and magenta). Direct intermolecular hydrogen bonds are shown in red dotted lines. f Calcium-independent GalNAc recognition by SPL-2 (PDB code 6A7S). Direct intermolecular hydrogen bonds are shown in red dotted lines. Y66 contacts the GalNAc residue

3.1 Other Monosaccharide Binding Modes

Monosaccharide specificities of C-type lectins are generally defined by these two motifs, but there are several known exceptions. A tunicate lectin TC14 from Polyandrocarpa misakiensis has an EPS motif and a sea cucumber lectin CEL-IV from Cuvumaria echinate has the EPN motif. Contrary to the motif rule, they bind galactose at the primary binding site (Poget et al. 1999; Hatakeyama et al. 2011). In each structure, a tryptophan side chain stacks with the apolar face of the galactose ring (left panel in Fig. 2b). In the case of galactose recognition by the QPD motif, the sugar is typically stabilized with the tryptophan side chain, which is located on the opposite side (right panel in Fig. 2b). Consequently, the orientation of the galactose ring observed in TC14 and CEL-IV is inverted compared with the typical galactose-QPD motif interaction.

A second example is the mammalian C-type lectin receptor langerin which is expressed on Langerhans cells and mediates carbohydrate-dependent uptake of pathogens. Langerin has an EPN motif but, atypically, it accepts glycans with terminal 6-sulfated galactose. A crystal structure of langerin complexed with 6SO4-Galβ1-4GlcNAc shows that the galactose residue coordinates a calcium ion and the sulfate group forms salt bridges with two lysine residues located close to the primary binding site. This electrostatic interaction appears to compensate for the nonoptimal binding of galactose with the EPN motif (Fig. 2c) (Feinberg et al. 2011).

Another exception is the interaction between SIGN-R1 and sialic acid (Neu5Ac) (Silva-Martin et al. 2014). SIGN-R1, also known as CD209a, is a murine C-type lectin receptor with an EPN motif and is expressed in myeloid cells. A crystal structure of SIGN-R1 in complex with Neu5Ac shows that the carboxylic group, not the hydroxyl groups, of the sialic acid makes coordination bonds with the calcium ion of SIGN-R1 and hydrogen bonds with adjacent amino acid residues (Fig. 2d). One asparagine residue (N288) contributes to sialic acid binding, independent of calcium coordination.

C-type lectin-like domains without an EPN or QPD motif are predicted to lack typical sugar-binding ability. Nevertheless, several do bind carbohydrate ligands. For instance, the eosinophil major basic protein (EMBP) is a constituent of the eosinophil secondary granule and has a C-type lectin-like domain. Surface plasmon resonance assay demonstrated that EMBP directly binds to heparin and heparan sulfate, but not to hyaluronic acid (Swaminathan et al. 2005). A crystal structure of human EMBP in complex with heparin disaccharide has been reported (Swaminathan et al. 2005). The authors introduced the disaccharide ligand by soaking into ligand-free crystals. Three EMBP molecules in the crystal lattice contact one heparin disaccharide unit. The major contact site is located close to the primary binding site of typical C-type lectins and the bound disaccharide unit is mainly stabilized by electrostatic interactions and hydrogen bonds (Fig. 2e).

One more example is the bivalve lectins SPL-1 and SPL-2, which show high affinities for GlcNAc or GalNAc containing glycans. Intriguingly, RPD and KPD motifs are found in SPL-1 and SPL-2, instead of EPN or QPD. Crystal structures of SPL-2 in complex with GalNAc demonstrated that the sugar binds near the putative primary binding site. However, the interaction mode is different from typical sugar binding via an EPN or QPD motif (Unno et al. 2019). 3-OH and 4-OH groups of GalNAc make hydrogen bonds with the putative primary binding site of the C-type lectin, and the acetamido group is sandwiched by tyrosine and histidine side chains via a stacking interaction (Fig. 2f).

A shrimp C-type lectin MjGCTL has a QAP (Gln-Ala-Pro) motif which was predicted to not to have calcium binding ability. However, a recent study shows that the lectin has sugar-dependent hemocyte encapsulation activity (Alenton et al. 2017). The carbohydrate recognition mechanism of this C-type lectin-like domain could be different. Structural analysis of carbohydrate recognition mechanisms with novel motifs will expand our understanding of this lectin family.

4 Oligosaccharide Recognition

C-type lectins have individual specificities despite the high sequence and structural similarities. For instance, langerin binds a diverse range of carbohydrates including high-mannose-type glycan, fucosylated blood group antigens, and glycans with terminal 6-sulfated galactose. Meanwhile, a C-type lectin receptor DCAR specifically recognizes phosphatidylinositol mannoside (PIM), a mycobacteria glycolipid, promoting a Th1 response during infection (Toyonaga et al. 2016). A fundamental question is how highly conserved C-type lectins recognize various types of glycan ligands.

Monosaccharide binding preference is defined by an EPN or QPD motif, while the specificity toward oligo- and polysaccharides is determined by the secondary binding site which is located near the primary binding site. The amino acid residues coordinating calcium ions are highly conserved, while the amino acid residues located within ~15 Å from calcium ion are less so. In particular, the secondary binding site is often formed by amino acid residues on the top face, where three β-strands (β2–β4), the long loop region and a part of two α-helices are located. The specific residues in the secondary binding site usually contribute additional interactions, or conversely, work to discourage binding of certain ligands.

From a structural aspect, it is important to understand how each C-type lectin attains specificity for the target glycan. To get a broader picture, 3D structures of C-type lectin domains in complexes with oligosaccharide ligands were extracted from the PDB and these are summarized in Table 1. In this chapter, we describe the structural basis for the specific oligosaccharide recognition mechanism of C-type lectins.

Table 1 List of 3D structures of C-type lectins and C-type lectin-like domains complexed with glycan ligands. The glycan complex structures were extracted to include more than just the disaccharide units. Crystal structures of C-type lectins in ligand free forms and complexed with monosaccharide or non-carbohydrate ligands were omitted

4.1 Oligosaccharide Recognition via EPN Motif

The structural basis for the specific oligosaccharide recognition of EPN motif-containing C-type lectins has been well studied. Here, we introduce the recognition mechanism of EPN motif-containing C-type lectins toward representative oligosaccharides such as high-mannose type, complex-type N-glycans, sialyl-LewisX, and glucose-containing glycans. The primary binding sites of corresponding C-type lectins bind mannose, fucose, and glucose residues of these oligosaccharides, and the secondary binding sites define their specificities.

4.1.1 Oligosaccharide Recognition of EPN Motif-Containing C-Type Lectins Through Mannose

  1. (I)

    Recognition of high-mannose-type N-glycan through mannose

High-mannose-type N-glycans are often found in viral and fungal glycoproteins as well as in nascent mammalian glycoproteins. A representative structure of mammalian high-mannose-type N-glycan is shown as Man9GlcNAc2 which contains mannose residues of α1-2, α1-3, and α1-6 linkages (Fig. 3a). The Manα1-2 Man unit is a common terminal structure on mannans of yeast and other fungi. Therefore, this disaccharide unit can be a target for several C-type lectins working as pattern recognition receptors in the immune system. Crystal structures of 16 EPN motif-containing C-type lectins have been reported in complexes with high-mannose-type glycans (Table 1). Of these, mannose-binding protein A (MBP-A) and pulmonary surfactant protein D (SP-D) are soluble proteins categorized as the collectin family (Group 3). DC-specific intercellular adhesion molecule-3-grabbing nonintegrin (DC-SIGN), langerin, and DC-associated C-type lectin-2 (Dectin-2) are type II membrane proteins which belong to the NK receptors family (group 5). L-Selectin is an adhesion receptor of the selectin family (Group 4) (Fig. 1a). The interaction modes of these lectins with high-mannose-type glycans are visualized in Fig. 3b. C-type lectins accept α1-2, 1-3 and 1-6 linkages of the mannose disaccharide unit at their primary binding sites with several variations. In this section, the atomic recognition modes of high-mannose-type glycan are discussed for each glycosidic linkage.

Fig. 3
figure 3figure 3

a A typical high-mannose-type glycan (Man9GlcNAc2, left panel) and biantennary complex-type glycan (right panel) linked to asparagine. Monosaccharide symbols follow the SNFG (symbol nomenclature for glycans) system (Varki et al. 2015). The glycosidic linkages and residue numbers are labeled at each position. b Summary of the interaction between high-mannose-type glycan and C-type lectins. Structures are classified based on the glycosidic linkages (Manα1-2Man, Manα1-3Man and Manα1-6Man) of the mannose residues. The positions of reducing and non-reducing mannose residues are labeled as “R” and “NR,” respectively. The residue names are also labeled. c Three cross-linked structures of C-type lectin-high-mannose glycan complexes. Two MBP-A glycan complexes (PDB code 2MSB, left and PDB code 1KX1, middle panels) and one Dectin-2 complex (PDB code 5VYB, right panel) are shown. The mannose residues at the primary binding sites are labeled. d Complex-type N-glycan recognition by BDCA-2 (PDB code 4ZET, left panel), hDCIR (PDB code 5B1X, middle panel), and mDCIR2 (PDB code 3VYK, right panel). e The branch-specific interaction with complex-type N-glycan. All four C-type lectins (mDCIR2 (PDB code 3VYK), DC-SIGN (PDB code 1K9I), Codakine (PDB code 2VUZ), and SP-D (PDB code 6BBE)) accept mannose residues (Man-4) of the α1-3 branch at the primary binding sites. It should be noted that the relative position of the α1-6 branch of the mDCIR2 complex is different from the other C-type lectins. C-type lectin domains are shown in surface models

  1. (I-1)

    Manα1-2Man unit recognition

There are two known binding modes for the α1-2 linkage: (a) mannose at the non-reducing end (non-reducing mannose) bound to the primary site (SP-D wild type and MBP-A in Fig. 3b), and (b) mannose at the reducing end (reducing mannose) bound to the primary site (Dectin-2, SP-D R343 V mutant, DC-SIGN and langerin in Fig. 3b). The surrounding amino acid residues determine the choice of the binding modes. In SP-D, Arg313 is located close to the primary site and the replacement of this arginine with valine dramatically changes the disaccharide binding mode (Crouch et al. 2009). The R343 V mutant accepts reducing mannose at the primary binding site and forms additional hydrogen bonds with the non-reducing mannose via the secondary binding site (Fig. 3b). Compared with mode (a), the orientation of mannose in mode (b) is flipped in the primary binding sites of Dectin-2, SP-D R343 V and DC-SIGN. Dectin-2 mainly recognizes inner mannose residues of Man7 glycan (Man-A and C) and the disaccharide recognition mode seems similar to that of DC-SIGN. However, in DC-SIGN-Man2 complex, 2-OH and 3-OH groups of the non-reducing mannose residue are adjacent to phenylalanine (F313 in Fig. 3b), which prevents further extension toward the non-reducing side. A structure of langerin with a Manα1-2Man oligosaccharide has also been described (Fig. 3b right panel), showing that the bound reducing mannose residue is flipped 180° compared with the other complexes in mode (b). Since the reducing mannose residue is clamped between asparagine (N287) and lysine (K299), the 1-OH group cannot be used for extension to a larger glycan. Another conformation has also been reported in which the non-reducing mannose resides in the primary binding site, though the electron density of reducing mannose is missing. Langerin preferentially binds the reducing mannose of the disaccharide, but rather binds the non-reducing end of whole high-mannose glycan.

  1. (I-2)

    Manα1-3Man unit recognition

C-type lectins can bind the terminal and inner Manα1-3Man unit of N-glycan (Fig. 3b). Langerin and MBP-A H189 V mutant recognize the non-reducing mannose residue of Manα1-3Man unit. Langerin Manα1-3Man disaccharide unit complex was obtained by using core Man5 oligosaccharide (Manα1-3[Manα1-3[Manα1-6]Manα1-6]Man). In this complex, non-reducing mannose tightly interacts with the primary binding site, while the reducing ends are exposed to solvent (Fig. 3b). Wild-type MBP-A recognizes the reducing end of the Manα1-3Man disaccharide unit, and however, introduction of an H189 V mutation causes a change in the bound residue (non-reducing end) as well as a 180° flip of the bound ligand (Ng et al. 2002) (Fig. 3b). The position of this histidine in MBP-A is different from that of R343 in SP-D, indicating that the binding mode is determined by a marginal balance of the surrounding residues.

DC-SIGN recognizes inner α1-3 linked mannose residues. A DC-SIGN-Man4 (Manα1-3[Manα1-6]Manα1-6Man) complex shows that all four mannose residues (corresponding to Man-A, Man-B, Man-4’ and Man-3 in Fig. 3a) are uniquely defined. In DC-SIGN-Man6 (Manα1-2Manα1-3Man[Manα1-2Manα1-6]Man) complex, Manα1-2Manα1-3Man, which is a part of the D2 arm, interacts with DC-SIGN in a major conformation. In addition, only the terminal Manα1-2Man unit (Man-A and Man-D2 residues) binds in a minor conformation. The interaction mode of the major conformation is similar to that of the Man4 complex and these two glycans are well superimposable (Fig. 3b). This observation indicates that DC-SIGN tightly associates with the inner tetrasaccharide unit (Manα1-2Manα1-3Manα1-6Man) of high-mannose-type glycan. The mannose residue (Man-A in Fig. 3b) coordinates with the calcium ion and both ends of the mannose residues are located at the secondary binding sites. The minor conformation of Man6 complex is similar to the major conformation of Man2 complex. Hence, DC-SIGN can recognize the inner α1-3 linked mannose residue as well as terminal α1-2 linked mannose of high mannose glycan. These multiple binding modes of DC-SIGN may enhance its apparent affinity towards glycoproteins carrying high-mannose glycan under physiological conditions.

L-selectin is a member of the selectin family and mediates cell adhesion and signaling in inflammation. A major physiological ligand of L-selectin is thought to be sialyl-LewisX (sLeX) as described in a later section. In addition to this ligand, L-selectin binds inner α1-3 linked mannose residues of high-mannose-type glycan. Interestingly, the binding mode is different from that of DC-SIGN. Crystal structures of N-glycosylated L-selectin were reported by two groups (Wedepohl et al. 2017; Mehta-D’souza et al. 2017). In these structures, L-selectin tightly binds to the N-glycan (Man5GlcNAc2) from the symmetry related L-selectin molecule via the sugar-binding site. The mannose at α1-3 branch (Man-4) resides in the primary binding site and the adjacent sugar residues, GlcNAc, β-mannose (Man-3) and α1-6 branched mannose (Man-4’), interact with L-selectin via the secondary binding site (Fig. 3b).

  1. (I-3)

    Manα1-6Man recognition

To date there is only one example showing the binding mode of α1-6 linked mannose. The 3D structure is obtained from an MBP-A-Man5 (Manα1-2Manα1-3[Manα1-3Manα1-6]Man) complex (PDB code: 2MSB). Remarkably, one high-mannose -type glycan bridges two MBP-A molecules (discussed in the next section). The Man-B residue at the non-reducing end of the Manα1-6Man unit binds to MBP-A. However, there is no apparent interaction between MBP-A and the mannose at the reducing side (Fig. 3b).

  1. (I-4)

    Cross-linking by high-mannose-type glycan

High-mannose-type glycan can function as a multivalent ligand for C-type lectins. Three cross-linked structures have been reported so far (Fig. 3c). Two out of three are MBP-A-high-mannose-type glycan complexes, while the other structure is Dectin-2-high-mannose-type glycan complex. Crystal structure of an MBP-A-Man5 (Manα1-2Manα1-3[Manα1-3Manα1-6]Man) complex (PDB code: 2MSB) shows that one high-mannose-type glycan is simultaneously recognized by two MBP-A molecules (Weis et al. 1992). One MBP-A molecule binds to the Manα1-2Man unit of the α1-3 branch (especially the Man-C residue), and the other MBP-A molecule binds to the Manα1-3Man unit of the α1-6 branch (Man-A, left panel in Fig. 3c). In both molecules, non-reducing mannose residues occupy the primary binding sites. The other complex, MBP-A-Man6GlcNAc2 complex, shows that one high-mannose glycan bridges two MBP-A trimers (Ng et al. 2002). In this MBP-A complex, one MBP-A molecule interacts with the Manα1-2Man unit, and the other MBP-A molecule interacts with the Manα1-6Man units of the α1-6 branches (especially the Man-B residue, middle panel in Fig. 3c). Comparing the two MBP-A cross-linked structures, all four MBP-A molecules recognize Man-C residues at the non-reducing end.

The Dectin-2-high-mannose-type glycan complex differs from the MBP-A complexes in terms of glycan binding mode. A crystal structure of Dectin-2 in complex with Man9 glycan shows that two Dectin-2 molecules sandwich one high-mannose-type glycan (Feinberg et al. 2017). One Dectin-2 recognizes the Manα1-2Man unit (Man-C and D1 residues) of the D1 arm, and the other Dectin-2 molecule interacts with the same disaccharide unit (Man-A and D2 residues) of the D2 arm (right panel in Fig. 3c). The interaction modes of the two Dectin-2 molecules are the same. The reducing end of the disaccharide resides in the primary binding site, while the non-reducing end is located at the secondary binding site. The recognition mode of inner mannose residues seems suitable for binding fungal mannans which have variable structures at their non-reducing ends.

  1. (II)

    Complex-type N-glycan recognition via mannose

Complex-type N-glycan is synthesized from high-mannose-type glycan by a series of enzymatic processes (Fig. 3a). Several C-type lectin receptors encoded in the Dectin-2 cluster on the natural killer gene complex preferentially interact with complex-type N-glycans, such as blood DC antigen 2 (BDCA-2), human DC immunoreceptor (hDCIR), and murine DCIR2 (mDCIR2) (Fig. 1a). These C-type lectins share high amino acid sequence identities; however, their ligand preferences are slightly different.

BDCA-2 binds to galactose-terminated biantennary glycans, defining an epitope found on a limited number of bi- and triantennary glycans (Riboldi et al. 2011). Unusually, BDCA-2 with an EPN motif binds galactosylated glycan. A crystal structure shows why. BDCA-2 primarily recognizes a mannose residue of the trisaccharide unit (Galβ1-4GlcNAcβ1-2Man) with a serine residue (S139) and additional interactions define the specificity between BDCA-2 and Galβ1-4GlcNAc (Fig. 3d (Jegouzo et al. 2015)).

In contrast, human DCIR (hDCIR) binds to GlcNAc-terminated biantennary N-glycan. The binding mode of hDCIR toward the disaccharide unit (GlcNAcβ1-2Man) is similar to that of BDCA-2 [Fig. 3d, (Nagae et al. 2016)]. However, the serine residue (S139) interacting with galactose in BDCA-2 is not conserved in hDCIR (A162). This indicates that the specificity is defined by the additional interaction with the terminal glycan residue.

BDCA-2 and hDCIR can interact with both α1-3 and α1-6 arms of biantennary complex-type N-glycan. mDCIR2 has unique specificity toward GlcNAc-terminated biantennary glycan with bisecting GlcNAc (Nagae et al. 2013). mDCIR2 shows arm preference, and the galactosylation of the α1-3 branch strongly inhibits binding. A crystal structure of a mDCIR2-bisected glycan complex demonstrated that mDCIR2 strictly recognizes the disaccharide unit (GlcNAcβ1-2Man) of the α1-3 arm as well as bisecting GlcNAc (Fig. 3d). The binding mode of GlcNAcβ1-2Man is similar to those of BDCA-2 and hDCIR. However, aspartate (D223) tightly interacts with the bisecting GlcNAc (Fig. 3d), which is not conserved in BDCA-2 or hDCIR. Due to the simultaneous interaction with both GlcNAcβ1-2Man and bisecting GlcNAc, mDCIR2 selects the α1-3 branch. Such simultaneous interaction is impossible using the α1-6 branch because it is located slightly too far from the bisecting GlcNAc.

As described in the previous paragraph, DC-SIGN (and DC-SIGNR) preferentially interact with high-mannose-type glycans and 3D structures of DC-SIGN have been reported bound to high-mannose glycan. Additionally, DC-SIGN and DC-SIGNR bound to complex-type glycans have been reported (Feinberg et al. 2001). The interaction modes of these receptors are different from those of BDCA-2, hDCIR, and mDCIR2. The mannose residue of the α1-3 branch (Man-4) coordinates a calcium ion, but the orientation is flipped compared with the mDCIR2-bisected glycan complex (Fig. 3e). The GlcNAc residue of the α1-3 branch is located away from the secondary binding site due to this flipping and the α1-6 branch is located on the surface of DC-SIGN. The mannose (Man-4’) and GlcNAc residues of the α1-6 branch form hydrogen bonds with DC-SIGN.

The bivalve lectin, codakine, from Codakia orbicularis binds a biantennary complex-type glycan (Gourdine et al. 2008). The binding mode of codakine is similar to that of DC-SIGN rather than mDCIR2 (Fig. 3e). The α1-3 branched mannose (Man-4) makes coordination bonds with a calcium ion and the α1-6 branch interacts with codakine via its secondary binding site.

A crystal structure of glycosylated porcine SP-D demonstrates that the sugar-binding site of SP-D accepts complex-type N-glycan attached on symmetry-related SP-D (van Eijk et al. 2018). The interaction mode is similar to those of DC-SIGN and codakine. The α1-6 branch is strongly kinked, possibly due to crystal packing (Fig. 3e).

4.1.2 Oligosaccharide Recognition of EPN Motif-Containing C-Type Lectins Through Fucose

C-type lectins primarily recognize mannose residues of high-mannose and complex-type N-glycans. In contrast, several C-type lectins such as selectins recognize the OH3 and OH4 groups of the fucose residue in sialyl-LewisX (sLeX, Neu5Acα2-3Galβ1-4[Fucα1-3]GlcNAcβ1-R). sLeX is a terminal component of N- and O-glycans on hematopoietic and endothelial cells. L-selectin prefers sLeX modified with sulfate on the GlcNAc residue. Selectins are expressed on vascular endothelium, platelets, or leukocytes and bind to cell surface glycoproteins harboring sLeX glycans such as PSGL-1. Upon ligand binding, selectins show the catch-bond behavior which is essential for initial tethering and rolling along the vascular endothelium and subsequent firm adhesion (Kansas 1996).

Crystal structures of E- and P-selectins complexed with sLeX have been reported (Somers et al. 2000; Preston et al. 2016). These lectins recognize a fucose residue at the primary binding site and form additional interactions with adjacent residues (GlcNAc, Gal, and Neu5Ac). A structural comparison of four structures shows that the positions of fucose, GlcNAc, and Gal are well superimposable, while the position of the terminal Neu5Ac is variable (Fig. 4a). Interestingly, a 3D structural difference is observed between sLeX co-crystallized and soaked complexes. In the soaked complexes, asparagine (N83) makes coordination bonds with the calcium ion (left panel in Fig. 4a). In contrast, glutamate (E88) makes coordination bonds with the calcium ion in co-crystallized complexes (right panel in Fig. 4a). This difference causes a positional shift of the flexible loop, leading to a global conformational change from bend (low affinity) to extend (high affinity) forms. The glutamate is therefore a key residue in stabilizing the high affinity conformation (Mehta-D’souza et al. 2017).

Fig. 4
figure 4

a Sialyl-LewisX recognition by P- and E-selectins. Bent conformations (low affinity state) of P-selectin (PDB code 1G1R) and E-selectin (PDB code 1G1T) are shown in left panel. Extend conformations (high affinity state) of P-selectin (PDB code 1G1S) and E-selectin (PDB code 4CSY) are shown in right panel. Two amino acid residues (N83 and E88) which take different positions in two conformations are shown in stick models. b Structural superposition of E-selectin-sialyl-LewisX complex (PDB code 4CSY, left panel) and DC-SIGN-LNFP3 complex (PDB code 1SL5, right panel). c Structural comparisons between E-selectin (PDB code 4CSY), MBP-A K3 mutant (PDB code 2KMB) and Langerin (PDB code 3P5G) in complexes with fucose-containing glycans. d Aglycon recognition observed in P-selectin-PSGL-1 complex (PDB code 1G1S). The interactions between P-selectin and sulfated tyrosine residues of PSGL-1 are highlighted and labeled. The P-selectin molecule is shown in surface model

The structure of L-selectin in complex with high-mannose-type glycan is similar to the co-crystallized complex, rather than the sLeX soaked complex, even though it assumes a bend conformation (Fig. 3b). The glutamate coordinates with the calcium ion and the flexible loop takes a similar position as in the co-crystallized complex.

Other C-type lectins, such as DC-SIGN, MBP-A mutant, and langerin, also bind to fucose-containing glycans. A crystal structure of DC-SIGN in complex with lacto-N-fucopentaose III (Galβ1-4[Fucα1-3]GlcNAcβ1-3Galβ1-4Glc) shows that the position of the fucose coincides well with those of selectins. However, the positions of GlcNAc and galactose are slightly different. This subtle difference may be derived from differences in the secondary binding site of DC-SIGN (Fig. 4b).

The introduction of a triple mutation (K211-K212-K213) in MBP-A (K3 mutant) enables it to accept a series of LewisX glycans as P- and E-selectins do (Ng and Weis 1997). However, the binding mode is quite different from those of selectins and DC-SIGN (middle panel in Fig. 4c). In this case, 2-OH (equatorial) and 3-OH (equatorial) groups of the fucose make coordination bonds with the calcium ion. One of three lysine residues (K211) makes a hydrogen bond with the galactose residue.

The fucose recognition mode of langerin is also different from those of selectins and DC-SIGN (Feinberg et al. 2011). In the structure of langerin-blood group B trisaccharide (Galα1-3[Fucα1-2]Gal) complex, 2-OH and 3-OH groups of the fucose residue also coordinate with the calcium ion (right panel in Fig. 4c).

The aglycon moiety of PSGL-1 contributes to the specific interaction with selectins. Of particular note, the physiological interaction between human P-selectin and PSGL-1 requires both sLeX capped core 2 O-glycan and one or more sulfated tyrosine residues in the N-terminal region of PSGL-1. In the crystal structure (Somers et al. 2000), human P-selectin recognizes both the sLeX attached to threonine and the sulfated tyrosine residues of human PSGL-1 (Fig. 4d). P-selectin also binds to the N-terminus of murine PSGL-1, although the sequence is different from human PSGL-1. Cell-based biochemical assays suggest that sulfation of tyrosine (Y13) and the O-glycan on T17 are necessary for murine PSGL-1 to bind optimally to P-selectin (Xia et al. 2003). The spacing of these residues in the sequence is considerably closer than the corresponding residues in the human PSGL-1 sequence (Y7, Y10 and T16). It is likely that murine PSGL-1 binds to P-selectin using a different conformation of the polypeptide.

4.1.3 Oligosaccharide Recognition of EPN Motif-Containing C-Type Lectins Through Glucose

A representative glucose specific C-type lectin is Macrophage inducible calcium-dependent lectin (Mincle), also known as CLEC4E. Mincle is expressed on macrophages and interacts with trehalose-6-6′-dimycolate (TDM), a glycolipid found on the surface of Mycobacterium tuberculosis (Matsunaga and Moody 2009; Ishikawa et al. 2009). TDM comprises a trehalose (Glcα1-α1Glc) headgroup and two complex branched and hydroxylated acyl chains. The acyl chains are attached to the 6-OH groups of each of the sugar residues. Crystal structures of bovine Mincle C-type lectin domains in complexes with a series of ligands allow visualization of the interaction modes with glycolipids. A Mincle–trehalose complex structure shows that both glucose residues interact tightly with Mincle. The 3-OH and 4-OH of one glucose coordinate with the calcium ion and the second glucose contacts a second binding site via hydrogen bonds (Feinberg et al. 2013) (Fig. 5a). The additional interaction of the aglycon moiety, such as the lipid part, should greatly improve the specificity. The Mincle–trehalose monobutyrate complex shows that the alkyl chain is located near the hydrophobic groove of Mincle (Feinberg et al. 2016). These hydrophobic residues likely form the extended binding site for the lipid moiety.

Fig. 5
figure 5

Glucose recognition mechanism of EPN motif-containing C-type lectins: a Mincle-trehalose complex (PDB code 4ZRW). b Langerin-laminaritriose complex (PDB code 3P5H). c Langerin-maltose complex (PDB code 3P7H). d SP-D-maltose complex (PDB code 3P7H). e Dextran sulfate recognition of SIGN-R1 (PDB code 4C9F). Four complexes in the asymmetric unit are superimposed. f Two sugar-binding sites (top face and side face) of SIGN-R1 (PDB code 4C9F)

In addition to Mincle, langerin, SP-D, and SIGN-R1 can bind glucose residues at their primary binding sites. The structure of a langerin-laminaritoriose (Glcβ1-3Glcβ1-3Glc) complex contrasts markedly with the Mincle–trehalose complex structure. In the langerin complex, the 1-OH and 2-OH of the glucose at the reducing end resides in the primary binding site and the other glucose at the non-reducing end points away towards solvent (Feinberg et al. 2011) (Fig. 5b). The langerin–maltose (Glcα1-4Glc) complex reveals that the 3-OH and 4-OH of glucose at the reducing end also coordinate with the calcium ion, but the orientation is totally flipped compared with that in the Mincle–treharose complex (Chatwell et al. 2008). The glucose at the non-reducing end is also exposed to solvent (Fig. 5c).

In contrast, the orientation of glucose in a SP-D-maltose complex is the same as in the Mincle complex (Fig. 5d). A crystal structure of SIGN-R1 in complex with oligo-dextran sulfate (α1-3 and α1-6 linked glucose polymer with sulfation) demonstrates a somewhat unusual binding mode (Silva-Martin et al. 2014). The glucose at the primary binding site is positioned differently compared with typical C-type lectins. Although the top face of the lectin accepts at least four glucose residues, only 4-OH of Glcα1-6Glc is located within the coordination bond in the primary binding site (Fig. 5e). In the case of sulfated glucose, ring oxygen (O5) seems to make a coordination bond with the calcium ion. It is noteworthy that “side” face of SIGN-R1 can accept the repetitive molecular patterns of the polysaccharide chain (Fig. 5f). This binding mode seems favorable for an interaction of a small globular domain with long polysaccharide chains.

4.2 Oligosaccharide Recognition of QPD Motif-Containing C-Type Lectins

QPD motif-containing C-type lectins have been less studied than those with EPN. Only three structures, Rattlesnake venom lectin (RSL) in complex with lactose (Walker et al. 2004), asialoglycoprotein receptor (ASGPR) lactose complex (Sanhueza et al. 2017) and scavenger receptor C-type lectin (SRACLA4) in complex with LewisX trisaccharide (Fucα1-3[Galβ1-4]GlcNAc) (Feinberg et al. 2007), are deposited in the PDB. In these structures, the 3-OH and 4-OH groups of galactose form coordination bonds with the calcium ion and the apolar face of the galactose is stabilized by a stacking interaction with hydrophobic amino acid residues (Fig. 6a–c). RSL recognizes only the galactose residue of lactose. In contrast, the βOH1 group of the glucose residue makes a hydrogen bond with arginine (R236) in an ASGPR-lactose complex (Fig. 6a and b). In the SRACLA4-LewisX complex, the fucose residue makes additional hydrophobic contact with isoleucine (I712) and hydrogen bonds with lysine (K691) (Fig. 6c). A structural comparison among the three complexes clarifies that the aromatic residues (Y100 in RSL, W243 in ASGPR, and W698 in SRACLA4) are located in equivalent positions and positively charged residues (R236 in ASGPR and K691 in SRACLA4) occupy similar positions in two of the complexes, evidently engaged in ligand recognition.

Fig. 6
figure 6

Carbohydrate recognition of QPD motif-containing C-type lectins: a RSL-lactose complex (PDB code 1JZN). b ASGPR-lactose complex (PDB code 5JQ1) and c SCARA4-LewisX complex (PDB code 2OX9)

It is tempting to compare the recognition modes of LewisX by the EPN and QPD motifs. In EPN motif-containing C-type lectins, fucose coordinates a calcium ion in the primary binding site and galactose resides in the secondary binding site (Fig. 4a, b). In the case of the QPD motif, by contrast, galactose resides in the primary binding site and fucose is located in the secondary binding site. Interestingly, the conformation of the LewisX trisaccharide is similar in both complexes. This observation suggests that both lectins recognize a stable conformation of the glycan. Glycan array analysis revealed that SRCL preferentially binds to Lewisa and LewisX, while DC-SIGN widely accepts various types of glycan such as Lewisa, Lewisb, LewisX, and LewisY (Feinberg et al. 2007; Guo et al. 2004). This may originate from differences in the binding modes of galactose/fucose.

4.3 Oligosaccharide Recognition of C-Type Lectin-like Domains

C-type lectin-like domains do not bind calcium due to the lack of conserved calcium binding motifs. However, some C-type lectin-like domains directly bind to carbohydrates in calcium independent ways. The sugar-binding modes of C-type lectin-like domains are thus expected to be completely different from those of typical C-type lectin domains and can be expected to be diverse.

Dectin-1 is a C-type lectin-like receptor having a single extracellular C-type lectin-like domain, a short stalk region, a single transmembrane helix and a cytoplasmic ITAM. Dectin-1 is a β-glucan receptor and shows preference for the β1-3 linked glucose polymer (Palma et al. 2006). Since the C-type lectin-like domain of Dectin-1 lacks a QPD or EPN motif, it loses its calcium binding ability at the primary site. A crystal structure of Dectin-1 complexed with laminaritriose shows that two C-type lectin domains sandwich one laminaritriose via their lateral faces (Brown et al. 2007) (Fig. 7a). Laminaritiose assumes a planar conformation and is stabilized by several hydrogen bonds. It should be noted that mutational experiments suggested that the top face of the Dectin-1 C-type lectin-like domain contributes to β-glucan binding, which is located far from the binding site observed in the crystal structure (Dulal et al. 2018). Direct evidence, as for example solution NMR analysis, is strongly needed to solve this discrepancy.

Fig. 7
figure 7

Ca2+-independent carbohydrate recognition of C-type lectin-like domain: a Dectin-1-laminaritoriose complex (PDB code 2CL8). b CLEC-2-O-glycosylated (Neu5Acα2-6[Galβ1-3]GalNAc-O-Thr52) podoplanin complex (PDB code 3WSR) and c CLEC-2-rhodocytin complex (PDB code 3WWK). Carbohydrate, Glu-Asp motif, and C-terminal Y136 are shown in stick models

Another example is C-type lectin like receptor 2 (CLEC-2), also known as CLEC1B. CLEC-2 is a type II transmembrane receptor with a short N-terminal cytoplasmic tail containing a single tyrosine-based activation motif (hemITAM), a transmembrane segment, an extracellular stalk region and a C-type lectin-like domain which has no calcium binding activity. Podoplanin is a transmembrane O-glycoprotein that binds to CLEC-2 in a glycosylation-dependent manner. Crystallographic analysis revealed that two consecutive acidic residues (Glu-Asp motif) as well as an α2-6 linked sialic acid residue attached on the O-glycan interact with the lateral face of CLEC-2 (Nagae et al. 2014). Four arginine residues of CLEC-2 interact with the sialic acid and the two acidic residues of podoplanin (Fig. 7b). Interestingly, snake venom rhodocytin also binds to CLEC-2, however the interaction mode is somewhat different. Although the Glu-Asp motif is conserved in rhodocytin, rhodocytin is not O-glycosylated. Instead, the carboxylate of the rhodocytin C-terminus contributes to an electrostatic interaction with the arginine residue of CLEC-2 (Fig. 7c).

4.4 Genetic Variants of C-Type Lectins

Specific sugar recognition is affected by changes in amino acid residues. Single-nucleotide polymorphisms (SNPs) and genetic variations are potentially involved in the susceptibility for developing disease and disease outcomes. Various SNPs and disease causative mutations of C-type lectins have been reported (Goyal et al. 2016). The effects of SNPs and species-dependent sugar binding are not well studied and are limited to a few examples. In human langerin, the W264R mutation was found in an individual who lacks Birbeck granules in the Langerhans cells (Verdijk et al. 2005). This mutation results in the loss of mannose-binding ability (Ward et al. 2006). W264 is located inside the protein, and thus this mutation destabilizes the local folding of langerin. At present several SNPs, A278 V, N288D, A300P, and K313I, in the human langerin C-type lectin domain are deposited in the SNP database (Ward et al. 2006; Feinberg et al. 2013). Of note, K313 is one of the two lysine residues which are critical for recognition of sulfated galactose (Fig. 2c). The affinity for sulfated galactose is dramatically reduced, instead, the affinity against terminal GlcNAc is increased (Feinberg et al. 2013).

Surfactant protein A (SP-A) consists of two isoforms, SP-A1 and SP-A2, encoded by separate genes. SP-A defends against invading pathogens in the lung. The missense mutations of SP-A2 C-type lectin domain result in idiopathic pulmonary fibrosis (IPF), a serious lung disease affecting older adults (Wang et al. 2009). These mutations, like F198S and G231 V, dramatically reduce the expression level of the protein probably due to the disruption of normal protein folding. In addition, the K223Q mutation in the C-type lectin domain is one of the SNPs which are significantly associated with tuberculosis (Malik et al. 2006). This replacement could affect the glycan recognition of SP-A2, though as yet the mechanism has not been revealed.

For SP-D, three SNPs, M11T, A160T, and S270T, have been reported (Leth-Larsen et al. 2005). These residues are located in the signal peptides of collagen-like and C-type lectin domains. The S270T mutant is located on the opposite side of the sugar binding site and the effect of the mutation on sugar binding is obscure. Another example is Dectin-1. The Y238X mutation of Dectin-1 is found in Africans and western Eurasians (Ferwerda et al. 2009). The Y238X mutant is poorly expressed and lacks β-glucan activity. Y236 in mouse Dectin-1, corresponding to human Y238, is buried inside the C-type lectin domain. Thus, the mutation likely inhibits proper folding. Although the binding of β-glucan is significantly lower in Y238X patients, fungal phagocytosis and fungal killing are normal, suggesting the presence of alternative receptors for phagocytosis.

The species-dependent sugar recognition mechanism has also been investigated. Human and murine langerin share 66% amino acid similarity, but show different ligand preferences for bacterial polysaccharides (Hanske et al. 2017). A crystal structure of murine langerin shows that the different residues map to the secondary binding site, which possibly accounts for the different ligand specificities.

5 Functional Oligomerization of C-Type Lectin Domains

C-type lectin and lectin-like domains often form stable homo- or heterodimers that are critical for their physiological functions. For example, C-type lectin-like domains of NK receptors (Group 5) such as Ly49s and NKGs accept various ligands via homo or heterodimers (Li and Mariuzza 2014). The C-type lectin domain itself forms a monomer in solution. However, many sugar-binding C-type lectins such as collectin and DC receptor families form multimers via their stalk or coiled-coil regions. Biochemical studies of murine Dectin-1 suggests that monomeric C-type lectin-like domain cooperatively forms an oligomer upon β-glucan binding (Dulal et al. 2018).

Domain swapping is a mechanism for two or more protein molecules to form a dimer or higher oligomer by exchanging an identical structural element (Liu and Eisenberg 2002). Oligomerization by domain swapping is often found in snake venom C-type lectin-like proteins (Eble 2019). In these proteins, the long loop between β2 and β3 strands extends away from the core of the protein to form a domain-swapped dimer. Various snake venom C-type lectin-like proteins form ordered heterooligomers via domain-swapping. Interestingly, domain-swapped heterodimers, bitiscetin, botrocetin, and rhodocetin, functionally grab their target platelet receptors such as von Willebrand factor (vWF) A1 domain, GPIbα, GPVI, and α2β1 integrin via an extended surface formed by an extended loop region (Maita et al. 2003; Fukuda et al. 2005; Eble et al. 2017).

In mammalian C-type lectin receptors, domain-swapped dimers can form under crystallization conditions. Crystal structures of several C-type lectin receptors are reported as domain-swapped dimers in a ligand-free state, such as CRD-4 of macrophage mannose receptor (Feinberg et al. 2000), BDCA-2 (Nagae et al. 2014), and NKRp1a (Kolenko et al. 2011). These domain-swapped dimers are similar to those of snake venom C-type lectin-like proteins. An interesting point is that the functional states of BDCA-2 and NKRp1a are monomeric and the domain-swapped dimers are thought to be an inactive state.

Of note, the BDCA-2 C-type lectin domain forms a domain-swapped dimer under three different crystallization conditions (Nagae et al. 2014). Although 1 mM calcium chloride was present during protein purification, no calcium ion was found in the calcium binding site. The side chain of E178, which is critical for calcium coordination, points away from the putative calcium binding site. This indicates that the domain-swapped BDCA-2 cannot accept a carbohydrate ligand. Subsequently, a crystal structure of BDCA-2 in complex with trisaccharide was reported. BDCA-2 is a monomer in the crystal and binds the sugar at the calcium binding site (Jegouzo et al. 2015).

In the case of NKRp1a, the extended loop points away from the central core and mediates formation of a domain-swapped dimer in the crystal (Kolenko et al. 2011). Although the refolding buffer contains a high concentration of calcium chloride, there is no calcium ion in the structure. In contrast, a solution structure determined by NMR is monomeric with the loop tightly anchored to the central region (Rozbesky et al. 2016). Calcium titration analysis suggests monomeric NKRp1a binds a calcium ion weakly with a dissociation constant in the mM range.

Although the physiological relevance of domain-swapped dimer formation is still obscure, the domain-swapped dimer is likely a metastable conformation. Formation of the domain-swapped dimer may be used as a temporary inactive state under certain conditions.

6 Conclusion and Future Perspective

C-type lectins accept various glycans via highly conserved binding sites. Although the primary binding site is the most conserved, there are differences in the secondary binding site that enable various types of glycans to bind. Furthermore, accumulating of 3D structural information demonstrates that C-type lectin-like domains also recognize sugar ligands in a calcium-independent manner. The relationship between SNP and sugar-binding activity has a direct bearing on the susceptibility for pathogens, and however, the mechanism is largely unknown. Structural analysis of C-type lectins from individuals is a promising avenue to our understanding of individual variations in the immune system.