Introduction

During the course of evolution, the immune system of vertebrates has consistently and remarkably developed to provide protection from pathogens such as bacteria, yeast, viruses, and parasites. To achieve this, the immune system of vertebrates comprises a complex network of cells, organs, tissues, proteins, and other molecules that collectively defend the organism against various pathogens. The immune system consists of two major interconnected types of defence mechanisms to counter these microbial threats, namely, the innate and adaptive immune responses. The innate immune response is triggered by pathogen-associated molecular patterns and typically represents a rapid first line of defence that does not retain any long-lasting immunological memory. The weaponry of the innate arm includes physical epithelial barriers, antimicrobial peptides and enzymes, mast cells, innate lymphocytes, neutrophils, and macrophages. The molecular targets of the adaptive immune response are called antigens (Ags) that can consist of proteins, carbohydrates, metabolites, and a wide range of chemically distinct lipids. Upon exposure to Ags, the adaptive immune response develops more slowly than the innate immune response, but will last longer. Two key immune cell populations within the adaptive arm of the immune system recognize Ags: T cells and B cells. T cells express T cell receptors (TCRs) on their cell surface that recognize small Ags or antigenic fragments from larger Ags. In T cell immunity, Ags are presented on the cell surface of professional antigen-presenting cells by specific antigen-presenting molecules. The subsequent recognition by T cells leads to a cascade of immune responses that ultimately results in the clearance of the harmful pathogens. Presently, studies in the field of T cell-mediated immunity have largely focused on understanding the molecular presentation of antigenic peptides by the major histocompatibility complex (MHC) proteins in humans and mice, and their subsequent molecular recognition by the TCR. MHC molecules are highly polymorphic glycoproteins that are able to bind peptides within their large and charged binding groove. The two classes of MHC molecules, class I and II MHC, bind small (8–13 residues long) or long (> 15 residues) peptides, respectively. To cope with the wide array of pathogens that are encountered by the host, the immune system has enormous diversity in the TCRs expressed on T cells and comprises two main classes of T cells, the αβ and γδ T cells. It was conventionally considered that αβ TCRs only recognized antigenic peptides complexed to the MHC (pMHC). The information gleaned from structural investigations on αβ TCR–pMHC complexes has been extremely informative in understanding how the TCR simultaneously, and specifically, focuses on host MHC and fragments of foreign peptide Ags [1, 2]. However, peptides are not the only class of antigens that TCRs are able to recognize; indeed, lipids [3] and the recently discovered small vitamin B metabolites [4, 5] can also activate T cells. In addition to MHC molecules, it is now clear that there are other Ag-presenting molecules (termed MHC class I-like) [6] of the immune system that play a vital role in protective immunity, including the MR1 molecule and the cluster of differentiation 1 (CD1) family of glycoproteins that present the vitamin B metabolites [4] and lipid-based Ags to specialized subsets of T cells, respectively [7,8,9]. The CD1 family represents an important cluster of largely monomorphic genes that have been classified, based on sequence identity and expression pattern, into two main groups of Ags-presenting molecule, namely, group 1 (CD1a, CD1b and CD1c) and group 2 (CD1d) (Fig. 1). CD1e forms a third group and its function is still unclear, but is believed to be involved in lipids transfer [10]. All CD1 molecules share structural similarities with the classical MHC-I molecules, but they have evolved to accommodate a chemically distinct class of Ags, namely, lipid-based Ags [11, 12]. Typically, lipid-based Ags are amphipathic molecules with polar headgroups (e.g. carbohydrates, sulphates, and phosphates) and hydrophobic tails. The CD1 molecules present these lipid Ags by sequestering the tails within a hydrophobic groove, while the polar headgroup is exposed at the CD1 surface for TCR recognition. Each CD1 isoform varies in terms of tissue distribution, intracellular trafficking, and factors that modulate expression levels, signifying a specific function for each type of CD1 Ag-presenting molecule. For instance, while CD1c can be found in the splenic marginal zone of B cells, the mantle zone of B cells of the lymph nodes and the tonsil [13], human skin is the site of high density CD1a protein expression on Langerhans cells [14, 15], the target of CD1a autoreactive T cell responses [16, 17]. Further, each CD1 isoform possesses distinct binding groove architecture and solvent accessibility that determines the repertoire of foreign and self-lipids that can be presented (Fig. 1). For instance, the large CD1b antigen-binding cleft (volume ~ 2200 Å3) (Fig. 1) can accommodate lipid-based antigens possessing long alkyl chains up to 80 carbons in length, whereas CD1a possesses a more constricted binding groove (volume ~ 1350 Å3) that limits the size and diversity of antigens that it can bind (reviewed by [18]). These differences between the CD1 isoforms manifest in their ability to bind differing arrays of lipids, and their subsequent recognition by the TCRs. Here, we review the fundamental principles underscoring the molecular presentation of microbial lipid-based Ags by the family of CD1 molecules and their subsequent molecular recognition by specialized T cell subsets.

Fig. 1
figure 1

The antigen-binding cleft architecture of CD1 glycoproteins. Cartoon representations of CD1a, light orange; CD1b, light blue; CD1c, pink; CD1d, light green; CD1e, cyan. For clarity, only the α1- and α2-domains for each CD1 are shown. The CD1 antigen-binding clefts are shown as surface representations and have been generated using the CASTp program [124]. The A′-, F′-, C′- and T′-pockets are coloured in blue, yellow, magenta, and red respectively. The published calculated volumes (V, Å3) of the individual CD1 antigen-binding pocket are indicated [18]. This figure and all molecular graphics in subsequent figures were created with the PyMOL molecular visualization system [125]

Recognition of microbial lipid-based Ags by group 2 CD1-restricted T cells

Natural killer T cells (NKT)

NKT cells are CD1d-specific innate-like T cells that, when specifically activated via their TCR, produce an array of cytokines, including Th1-, Th2-, and Th17-type cytokines, which enables them to influence immune outcomes in a broad range of diseases including a number of microbial infections [19, 20]. Two main classes of NKT cells exist, namely, type I and II, which are distinguished by their TCR gene usage and Ag specificity [21]. Typically, the human type I NKT cells express an invariant TRAV10+ (T cell receptor alpha variable) TRAJ18+ (T cell receptor alpha joining) rearranged TCR α-chain and most express a TRBV25-1+ (T cell receptor beta variable) TCR β-chain. Type I NKT cells are also present in other mammalian species, including mice, whereby the NKT cells express an invariant TCR α-chain rearrangement TRAV11+TRAJ18+, and generally use one of three different TCR β-chain variable genes (TRBV13, TRBV29, or TRBV1). The type I NKT TCRs are also defined by their ability to recognize the prototypical glycosphingolipid α-galactosylceramide (α-GalCer) as originally isolated from the marine sponge Agelas mauritianus [22] (Fig. 2). As opposed to the type I NKT TCRs, the type II NKT TCRs are defined by their inability to respond to α-GalCer and are characterized by the expression of a more diverse TCR gene repertoire, but they share their specificity for CD1d with type I NKT cells [23,24,25].

Fig. 2
figure 2

Chemical structures of the CD1-restricted microbial lipid-based antigens. The lipids have been grouped by chemical classes and their binding to a specific CD1 molecule is indicated: CD1a (pink), CD1b (purple), CD1c (green), and CD1d (blue). For the lipophosphoglycan family, the carbohydrate headgroups are represented as coloured spheres (mannose, green; inositol, grey) and ovals (mannan, brown; arabinan, red). The chemical structures were prepared using ChemDraw Professional

Molecular presentation of microbial CD1d-restricted lipid-based Ags

Together, the CD1d molecule from the different mammalian species form the group 2 CD1 family (Fig. 1) and presents lipid-based antigens to the aforementioned invariant type I NKT cells that can express αβ TCRs [26] and clonally diverse T cell subsets expressing αβ, γδ, and δ/αβ TCRs [24, 25, 27,28,29,30,31]. Structurally, CD1d exhibits a medium-sized antigen-binding groove that comprises two main antigen-binding pockets, namely, the A′- and F′-pockets (Fig. 1) [32, 33]. Here, whilst the A′-pocket is large, deeply buried, and can accommodate acyl chain of lipids up to 29 carbons in length, the F′-pocket is smaller and thus has restricted capacity to bind sphingosine chains to only ~ 18 carbons in length. However, its specialized binding groove architecture and size has enabled CD1d to present a diverse range of exogenous lipid-based antigens to NKT cells that comprise chemically distinct classes of microbial lipids such as glycosphingolipids, glycerol-based lipids (DAG), phospholipids, and lysolipids (Fig. 2) [34,35,36]. As such, the phosphatidylinositol mannoside (PIM) (Fig. 2) and a lipophosphoglycan (LPG) isolated from the Mycobacterium bovis cell wall and Leishmania donovani, respectively, represented the first reported microbial lipid antigens recognized by NKT cells [37, 38]. However, it was later shown that a chemically synthesized PIM4 failed to stimulate type I NKT cells [39]. Microbial glycosphingolipids (Fig. 2) also represent a well-characterized class of lipid Ags for NKT cells. In particular, analogues of the prototypical iNKT antigen, α-GalCer, that comprise α-GalCer Bf and Agelasphin-9b isolated from the gut bacterium Bacteroides fragilis and the marine sponge Agelas spp., respectively, were shown to be antigenic ligands for NKT cells [40, 41]. Furthermore, the Gram-negative bacteria Sphingomonas spp. [42] α-glucuronosylceramides (α-GlcACer), and α-galacturonosyl ceramides (α-GalACer) were also shown to be stimulating ligands for NKT and thereby inducing an increased production of IFN-γ and IL-4 [34, 42,43,44,45]. Further studies identified the glycosphingolipid GalA-GSL produced by Sphingomonas spp. to activate NKT cells, albeit to a lesser extent compared to α-GalCer. The numerous available crystal structures of the bound glycosphingolipid α-GalCer into CD1d in human and mouse [46, 47] demonstrated a conserved mode of binding, whereby its galactose headgroup protrudes out of the CD1d-binding cleft to be exposed for interactions with the NKT TCRs, while the phytosphingosine and the fatty acid chains (Fig. 2) are typically buried within the F′- and A′-pockets of CD1d, respectively (Fig. 3). The binary crystal structure of the Sphingomonas spp. GalA-GSL lipid bound to mCD1d [48] provided further molecular insights into the mode of binding of microbial glycosphingolipid into CD1d. Here, while α-GalCer and GalA-GSL differ by the chemical nature of their headgroup (galactose vs. galacturonic acid, respectively) and their sphingosine chains (Fig. 2), their overall positioning within the CD1d-binding groove was highly conserved (Fig. 3a).

Fig. 3
figure 3

Molecular presentation of microbial lipid-based Ags by CD1d. a Cartoon representation of the crystal structure of mouse CD1d–microbial lipids binary complexes. For clarity, only the α1- and α2-domains of mouse CD1d (mCD1d) (light green) are shown. The microbial glycolipids GalA-GSL (cyan) from Sphingomonas spp., GlcDAG-s2 (brown) from S. pneumoniae, BbGL2c (dark green) and BbGL2f (bright green) from B. burgdorferi are shown as spheres. For mCD1d–GalA-GSL, a spacer lipid is present in the A′-pocket and is shown as black spheres. b Superposition of the glycosphingolipids GalA-GSL (cyan) and α-galactosylceramide (α-GalCer) (black). c Superposition of mCD1d presenting the diacylglycerol glycolipids BbGL2c (dark green) and BbGL2f (bright green). d Superposition of mCD1d presenting the diacylglycerol glycolipids αGlcDAG-s2 (brown) and BbGL2f (bright green). For clarity, only the α1- and α2-domains of mouse CD1d (mCD1d) (light green) are shown and the lipids are shown as spheres. The oxygen and nitrogen atom are coloured in red and blue, respectively

The important protective role played by NKT cells in microbial-mediated immunity was also highlighted by Olson et al, who demonstrated clearance of the Gram-negative bacteria Borrelia burgdorferi (the causative agent of Lyme disease) from mice through an NKT-dependent activity, including the secretion of IFN-γ [49]. Further studies provided molecular insights into the nature of the B. burgdorferi CD1d-presented Ags through the characterization of a new chemical class of activating microbial lipid-based Ags (diacylglycerol or DAG) for NKT cells (Fig. 2) [39]. Indeed, two isolated glycoglycerol lipids (BbGL-2c and BbGL-2f) that structurally shared a galactose headgroup, but differed in the chemical nature of their two fatty acid chains (Fig. 2), were both able to stimulate mouse and human NKT cells, albeit with different levels of potency [39]. In particular, BbGL-2c was significantly more potent than BbGL-2f for mouse invariant NKT cells. This latter intriguing observation led to suggest a possible role played by the different fatty acid chains into directing a different overall positioning of the DAG lipids into CD1d and thereby affecting their subsequent molecular recognition by the NKT TCRs. The binary crystal structures of mouse CD1d presenting BbGL-2c and BbGL-2f (Fig. 3a and Table 1) [50] subsequently supported this initial interpretation, whereby the DAG adopted two distinct binding configurations within CD1d (Fig. 3b). Here, the A′- and F′-pockets of CD1d accommodated the sn1-linked oleic acid (C18:1) and sn2-linked palmitic acid (C16:0) chains (Fig. 2) of BbGL-2c, respectively, while BbGL-2f was sequestered within the CD1d cleft in a complete reverse orientation whereby the sn2-linked oleic acid (C18:1) and sn1-linked linoleic acid (C18:2) (Fig. 2) were bound within the A′- and F′-pockets of CD1d, respectively (Fig. 3b). Thus, this remarkable rearrangement of the fatty acid chains within CD1d also impacted on the overall positioning of the exposed galactose headgroup shared by both DAG lipid structures (Fig. 3c) and which represents the key structural motif enabling NKT recognition. Collectively, these findings remarkably highlight how fine chemical modifications of the fatty acid chains of lipids such as the number of carbon unsaturations can drastically impact on their mode of presentation by CD1d and consequently influence their level of immunogenicity towards NKT cells. More recently, the repertoire of antigenic microbial DAG-derived lipids for type I NKT cells was extended to other bacterial species through the isolation and characterization of glucose-based DAG Ags (α-glucosyl-diacylglycerol) from Streptococcus pneumoniae (Fig. 2) [51]. As observed for the B. burgdorferi galactose-based DAG Ags, the antigenic potency for NKT cells of the glucose-based DAG Ags was also greatly affected by the chemical nature and length of the aliphatic tails of the lipid Ags [51]. The binary crystal structure of the mouse CD1d in complex with α-GlcDAG-s2 (Fig. 3a) revealed a very unusual orientation of the sn2-linked oleic acid (C18:1) bound within the A′-pocket, whereby the aliphatic tail swirled around in the opposite direction to what has been previously observed in other CD1d-Ags binary crystal structures [31, 52,53,54] (Fig. 3d).

Table 1 Three-dimensional crystal structures of CD1 molecules in complex with microbial lipid-based Ags

Pathogenic lipopeptidophosphoglycans from Entamoeba histolytica (EhLPPG), the causative agent of amoebiasis, were also reported to exhibit stimulatory effects for NKT cells through both TCR and Toll-like receptor (TLR)-mediated pathways. Two phosphatidylinositol-based lipids (EhPIa and EhPIb) (Fig. 2) were identified from the EhLPPG active fraction and interestingly, only EhPIb had the ability to produce IFN-γ [55]. Finally, cholesteryl α-glucoside (αCAG) from the gastric pathogen Helicobacter pylori [56] (Fig. 2) was also shown to be presented by CD1d and to activate NKT cells in both mice and humans [57, 58].

Whilst it is now clear that type I NKT cells play a central role in microbial-mediated immunity, experimental evidence for type II NKT cells to fulfil a similar function is much more limited. However, Tatituri et al. recently reported that lipids isolated from the cell wall of Mycobacterial spp. that comprised phospholipids such as phosphatidylglycerol (PG) and phosphatidylinositol (PI) were able to activate a range of type II NKT cell hybridomas (Fig. 2) [59]. Similarly, PG isolated from the cell wall of Listeria monocytogenes exhibited reactivity towards type II NKT cells [60]. In both cases, IL-2 cytokine was produced upon the activation of type II NKT cell hybridomas by the bacterial PGs.

While it is becoming evident that the number of characterized microbial CD1d-presented lipid Ags is constantly growing (Fig. 2), our current crystallography-based molecular insights into their presentation by CD1d have been essentially limited to two classes of lipids (glycosphingolipids and diacylglycerol glycolipids) (Fig. 2 and Table 1) [48, 50, 51]. There is therefore significant scope to explore the molecular presentation of other relevant classes of microbial lipid-based Ags.

Molecular basis for the recognition of microbial lipid-based Ags by type I NKT TCRs

The molecular mechanism that underpins the recognition of microbial lipids by NKT TCRs has been rather surprisingly unexplored so far. Indeed, the crystal structures of the mouse type I NKT TCR in complex with mouse CD1d (mCD1d) presenting the lipids α-GalA-GSL, α-GalDAG-s2 [61], and BbGL-2c [62] (Fig. 4 and Table 2) only recently provided the first detailed insights into the molecular recognition of microbial lipid-based Ags by NKT TCRs. As observed previously in the crystal structures of type I NKT TCR–CD1d-lipids ternary complexes [23, 47, 63, 64], the NKT TCR adopted a docking strategy whereby the TCR positioned in a parallel fashion over the F′-pocket of the CD1d binding cleft (Fig. 4a). Here, at the NKT TCR/CD1d–microbial Ags interface, the CDR loops (complementary determining region) of the TCR α-chain (CDR1α and CDR3α) were the main contributors to the molecular interactions [46].The carbohydrate headgroups of the Ags protruded from the CD1d binding cleft and were exclusively contacted by residues belonging to the CDR1α and CDR3α loops (Asn30α, Arg95α, and Gly96α). Interestingly, upon NKT TCR engagement, the overall orientation and positioning of the carbohydrate headgroups of the three microbial Ags was largely conserved and was similar to α-GalCer (Fig. 4b). Furthermore, the position of all the NKT TCR CDR loops and particularly the CDR1α and CDR3α that are key contributors to the recognition of α-GalCer were also highly preserved between the 4 NKT TCR–CD1d-Ags ternary complexes (Fig. 4b, c). However, as opposed to the conserved position of α-GalCer, upon TCR ligation, the orientation and positioning of the DAG-based microbial lipids were markedly affected. For instance, the acyl chain of α-GalDAG that encircles the A′-pocket of the CD1d binary structure in the clockwise direction preferred a counterclockwise direction in the ternary structure [48, 61]. Furthermore, the carbohydrate headgroups of the microbial DAG lipids were all repositioned towards the centre of the antigen-binding cleft in the ternary structures [61] adopting an overall position similar to α-GalCer (Fig. 4c) and thereby suggesting an “induced fit” molecular mechanism as an attributing feature for the type I NKT TCR recognition of microbial lipid-based antigens.

Fig. 4
figure 4

Molecular recognition of CD1d presenting microbial lipid-based Ags by NKT TCR. a Crystal structures of NKT TCR–mCD1d–GalA-GSL (left panel), NKT TCR–mCD1d–αGlcDAGs2 (middle panel), and NKT TCR–mCD1d–BbGL-2c (right panel) ternary complexes. The mCD1d and β2-microglobulin molecules are coloured in light green and light grey, respectively. The NKT TCRα and TCRβ are coloured in pink and yellow, respectively. The microbial glycolipids GalA-GSL (cyan), αGlcDAGs2 (brown), and BbGL-2c (dark green) are shown as spheres. b View from the top of an overlay of the three NKT TCR–mCD1d–microbial lipids crystal structures and NKT TCR–mCD1d–α-GalCer. For clarity, only the CDR loops are shown and coloured as for the respective lipids in each structure, GalA-GSL (cyan), αGlcDAGs2 (brown), BbGL-2c (dark green), and α-galactosylceramide (α-GalCer) (black). The lipids are shown as spheres. c Overlay of three NKT TCR–mCD1d–microbial lipid crystal structures and NKT TCR–mCD1d–α-GalCer. For clarity, only the CDR1α and CDR3α loops are shown and coloured for the respective lipids in each structure, GalA-GSL (cyan), αGlcDAGs2 (brown), BbGL-2c (dark green), and α-galactosylceramide (α-GalCer) (black). The lipids are shown as sticks

Table 2 Three-dimensional crystal structures of TCR–CD1–microbial lipid-based Ags ternary complexes

Recognition of lipid-based Ags by group 1 CD1-restricted T cells

Intracellular trafficking and loading of microbial lipids

The intracellular trafficking pattern of CD1 molecules differs markedly from the classical MHC molecules, such that the MHC-I glycoproteins are typically loaded with peptidic Ags during their synthesis in the endoplasmic reticulum (ER). The MHC-II is blocked with the CLIP peptide (Class II-associated invariant chain peptide) directly after synthesis, which is then replaced in the late lysosome by exogenously captured peptidic antigens. Thus, MHC molecules generally do not travel to the cell surface before they have gone through a cellular compartment where foreign peptides are loaded. By contrast, all the CD1 molecules are initially loaded with self- lipids that are present in the ER during synthesis, and before they are exposed to lipids from other cellular compartments the newly synthesized CD1 molecules travel to the cell surface where they encounter lipids from the cellular environment [65, 66]. Thus, the antigen loading into CD1 molecules seems to take place by replacement of self-lipids. The lipids that are present in the ER and that are known to bind CD1 are phosphatidylinositol, phosphatidylethanolamine, phosphatidylcholine, phosphatidylserine and ceramides [67]. Lipid loading of CD1 at the cell surface is possible [68, 69], but it is unclear what the physiologic role of this process is [70]. After this initial surfacing, each CD1 isoform travels to a specific cellular compartment, guided by signals in its cytoplasmic tail that interact with adaptor proteins [71,72,73,74]. While CD1a travels to the early endosome, CD1b and CD1c travel to the late endosome and the intermediate endosome, respectively. Each of these compartments differs by their pH, and the nature of the enzymes and lipid transfer molecules that are present. Whereas the MHC molecules are completely dependent on the digestion of proteins to liberate antigenic peptides, most known lipid antigens do not need to be chemically cleaved or modified by the APC before they can be presented by CD1 molecules to T cells, and this includes glucose-6-O-monomycolate (GMM) [75]. There are three examples of chemical cleavage of lipids Ags that are essential for antigenicity. The first one is the removal of a mannose moiety from mannosyl phosphomycoketide (MPM) to form phosphomycoketide (PM), which is absolutely required to enable the recognition by the T cell clone DN1 [76]. However, the mannose moiety is required for the recognition of mannosyl phosphomycoketide (MPM) by another T cell clone, CD8-1 [76,77,78]. The second example of cleavage of larger lipids to release antigenic lipid is the removal of two acyl chains from mycobacterial tetra-acylated phosphatidylinositol mannoside, as well as up till four mannosides to generate antigenic dimannosyl phosphatidylinositol mannoside with two acyl chains [79, 80]. The third example is a chemically designed antigen: dihexosylceramide is dependent on the removal of a carbohydrate moiety to turn into antigenic α-GalCer [81]. Aside from these chemical alterations of lipids, the key factors that contribute to CD1 lipid loading and presentation are pH and lipid transfer molecules. Low pH of the late endosomal compartment enables CD1b to undergo conformational changes that facilitate the insertion of long lipids [82]. Lipid transfer molecules are thought to enable the extraction of lipids from lipid aggregates (membranes and cell walls) and to facilitate their transport through the aqueous environment of the cell. In the ER, the microsomal triglyceride transfer protein performs these functions [83], while in the endocytic pathway saposins are active [84,85,86]. The lysosomal lipid transfer molecule CD1e has been demonstrated to be required for the presentation of phosphatidylinositol mannoside as well as its chemical modification [79, 87].

Group 1 CD1-restricted T cell repertoire

The hallmarks of the MHC system are the high level of polymorphism among the MHC molecules associated with the high diversity of the TCR repertoire due to random genetic recombination. Somatic recombination generates TCRs that can interact with all possible allelic variants of MHC and an enormous diversity of pathogen-derived peptides. Invariant NKT cells also make use of the TCR recombination machinery, but in a different way: their TCR α-chains are formed by recombinations without or with very few insertion/deletions of nucleotides (N) that occur at such high frequency that all human beings form these NKT TCRs. Since they recognize a non-polymorphic molecule that is expressed in all humans, NKT cells are positively selected and activated in all humans. However, the monomorphic nature of the antigen-presenting molecule does not inherently limit the TCR repertoires to TCRs lacking N nucleotides, and in fact many examples have emerged of CD1-specific TCRs with extensive N regions [88, 89]. The question that is still unanswered is: how common is TCR conservation among non-polymorphic antigen-presenting systems? This question is important to address because knowledge of invariant, microbial lipid-specific TCRs could potentially lead to diagnostics of microbial infections based on detection of expanded invariant TCRs in antigen-exposed humans. The discovery of invariant NKT TCRs took place long before the tetramer technology was available, and was facilitated by the relative abundance of NKT cells. Currently, lipid-loaded CD1 tetramers are available and allow for direct isolation and TCR sequencing of lipid-specific T cells from blood. This technique was first applied to study TCRs that interact with CD1b–GMM tetramers [90, 91] and led to the discovery of GEM (germline-encoded mycolyl lipid-reactive) TCRs and LDN5-like TCRs as groups of TCRs that share structural features and that can be found in many blood donors.

GEM T cells express an invariant α-chain defined as identical or nearly identical sequences derived from different clones and different blood donors, and a β-chain that typically uses TRBV6-2 or TRBV30, but without any apparent CDR3β length and sequence conservation. Expression of an invariant TCR chain is called type 3 TCR bias [92], and shared V gene usage without CDR3 conservation is called type 1 bias. Thus, type I NKT cells and GEM T cells each express a defining, invariant (type 3-biased) α-chain, and a β-chain repertoire that consists of one or two Vβ genes without CDR3 conservation (type 1-biased).

LDN5-like cells were discovered alongside GEM T cells as a result of analysis of TCRs that recognize CD1b–GMM tetramers. The first CD1b–GMM-specific TCR that was ever sequenced, LDN5, was initially considered to represent a single example of a diverse TCR repertoire for this antigen, mainly because it utilized many N nucleotides in α- and β-chain [88], but when many more TCRs with this specificity were sequenced using tetramers, the LDN5-like TCR pattern became apparent. LDN5-like expressed TRAV17 and/or TRBV4-1, which are the Vα and Vβ genes that are also used by LDN5, but did not share CDR3 sequences. Therefore, LDN5-like cells have type 1-biased α- and β-chains.

As described above, many well-characterized group 1 CD1–microbial lipid-specific T cells belong to the αβ T cell lineage. However, it is also established that group 1 CD1 molecules presenting self-lipid Ags can be bona fide ligands for specific γδ T cell populations [93, 94], and thus raising an important question: does this elusive subset of T cells also play a key role in the group 1 CD1-mediated microbial surveillance? A recent report by Roy et al. [95] identified and biophysically characterized CD1c–Mycobacterium tuberculosis (Mtb) lipid-specific γδ TCRs, and thus provided the first emerging insights into the microbial lipid reactivity of group 1 CD1-restricted γδ T cells.

Because additional working CD1 tetramers loaded with microbial lipids have recently been developed, new invariant or biased αβ and γδ TCRs can now be discovered. Microbial lipid–CD1 combinations that have not yet been used to systematically study the human TCR repertoire are: CD1a–dideoxymycobactin [96], CD1b–sulfoglycolipid, CD1b–mycolic acid [97], and CD1c–phosphomycoketide [76]. Though more sensitive to the bystander effect and to immunostimulatory properties that are independent of TCR activation, an alternative to tetramer-based identification of antigen-specific T cells is T cell isolation based on cytokine expression after stimulation with lipid antigen and CD1-expressing antigen-presenting cells [98]. With two available, independent methods of detection of group 1 CD1–microbial lipid-specific T cells, it is now possible to follow these T cells in humans and answer basic questions about their expansion and activation during disease. These questions could previously only be addressed using guinea pigs or mice transgenic for human group 1 CD1 molecules [99, 100].

Molecular presentation of microbial lipid-based Ags by group 1 CD1

The most well-characterized microbial lipid antigens presented by the group 1 CD1 molecules are found within Mycobacterium tuberculosis (Mtb) [101], and those include free mycolic acids [3] and its derivatives, mannophosphoisoprenoids [102], mannosylated lipoarabinomannan [103], lipomannan [103], and phosphatidyl-myo-inositol mannosides [104]. The extreme complexity of the lipid-rich cell wall of mycobacterial species is rather unique and represents an ideal source for a wide range of chemically distinct class of lipids to be presented by group 1 CD1 molecules. Mycolic acid, a key component of the Mtb cell wall, was the first characterized lipid antigen presented by CD1b [3]. Mycolic acids are high molecular weight lipids that comprise two main components: A β-hydroxy fatty acid and a long branched α-alkyl lipid tail [105] (Fig. 2). Though not yet analysed at the molecular level, two groups have reported the influence of the lipid tail length or composition of mycolic acids on T cell responses against CD1b–mycolic acid [97, 106].

The list of CD1b-presented lipid Ags was further extended to derivatives of mycolic acids, whereby additional headgroup moieties were incorporated, such as in GMM (Fig. 2) and glycerol monomycolate [107,108,109,110]. Mycobacterial GMM is characterized by the presence of a glucose linked to the 6-position with a mycolyl β-hydroxy chain. Three forms of GMM, with varying mycolyl unit lengths of C32 (GMM-C32), C54 (GMM-C54) and C80 (GMM-C80), have been shown to activate GMM-reactive CD1b-restricted αβ T cell lines and clones isolated from blood of patients infected with Mtb [111]. To date, the crystal structures of CD1b presenting the GMM-C32 [112] and GMM-C54 [113] isoforms have been determined (Fig. 5). In both crystal structures, the β-hydroxy fatty acid and the α-alkyl lipid tail are bound within the C′-channel and A′-pockets, respectively. The β-hydroxy fatty acids are 10 carbons and 16 carbons in length for the GMM-C54 and GMM-C32 derivatives, respectively, and thereby enabling the GMM-C32 to be sequestered deeper into the C′-pocket of CD1b. By contrast, the α-alkyl lipid tails for GMM-C54 and GMM-C32 are 50 and 20 carbons in length, respectively, and CD1b is able to accommodate the entirety of the 50-carbon tail of GMM within its T′-tunnel and F′-pockets. Interestingly, whilst GMM-C32 does not extend throughout the entirety of the CD1b antigen-binding pockets, a hydrophobic scaffold lipid is bound within the T′-tunnel to maintain the structural integrity of the glycoprotein. This scaffold lipid originated from the expression system that was utilized to produce the recombinantly expressed CD1b molecule; the presence of spacer lipids in CD1b has been previously reported [67, 114], but the precise chemical natures of these scaffold lipids and their possible immunogenic roles remain unclear. In both crystal structures, the GMM glucose headgroup protrudes out of the CD1b-binding groove to be exposed for GMM-restricted TCR recognition [112]. The exquisite specificity of the TCR for the glucose moiety explains the absence of an effect of mycolic acid chain length on the activation of GMM-specific T cells [107].

Fig. 5
figure 5

Molecular presentation of microbial lipid-based Ags by group 1 CD1. Cartoon representation of the crystal structure of group 1 CD1–microbial lipids binary complexes. For clarity, only the α1- and α2-domains of CD1a (light orange), CD1b (light blue), and CD1c (pink) are shown. The microbial lipid antigens Ac2SGL (purple), glucose monomycolate (GMM C32) (light blue), GMM C54 (dark blue), synthetic dideoxymycobactin (JH-02215) (red), phosphomycoketide (PM) (brown), and mannose-phosphomycoketide (MPM) (pink) are represented as spheres. The spacer lipids are shown as black spheres. The oxygen, nitrogen, sulphur, and phosphate atoms are coloured in red, blue, yellow, and orange, respectively

Two other classes of unique mycobacterial lipids presented by group 1 CD1s are the lipopeptides and the sulfoglycolipids that are presented by CD1a [115,116,117] and CD1b [114, 115], respectively. The identified lipopeptides belong to an iron-chelating subfamily of siderophores named mycobactins. CD1a can present to T cells a naturally occurring modified form of mycobactins that lacks two hydroxyl groups, named didehydroxymycobactin (DDM) (Fig. 2) [118]. The structural characterization of CD1a presenting a synthetic mycobactin lipid analogue (JH-02215) [117] reveals that, in clear contrast to the CD1b–GMM structure, the lipopeptide is fully buried within the antigen-binding groove (Fig. 5). Here, whilst the single acyl chain is bound within the A′-pocket, the peptidyl moiety and the lysine branch are bound along and into the F′-pocket, respectively [117]. The hydrophilic N-aryl group of the lipopeptide is solvent exposed and is proximally positioned to Arg73 at the A′-roof [117]. Interestingly, DDM represents the sole identified CD1a-presented lipid-based Ag from Mtb.

Sulfoglycolipids form a group of compounds that are found only within Mtb, and not in other mycobacterial species [119]. The sulfoglycolipids (Fig. 2) contain two to four acyl tails that are linked by a polar trehalose group harbouring a sulphate group that has been found to be essential for the immunogenic property of the Ag [115]. CD1b presents the diacylated form of sulfoglycolipid (Ac2SGL), whereas forms with one, three, or four acyl chains are not [115]. The three-dimensional structure of CD1b in complex with Ac2SGL (Fig. 5) revealed that the C16-palmitoyl and hydroxyphthioceranic acid tails were bound within the C′- and A′-pockets, respectively [114]. CD1c has also the ability to present microbial lipid antigens to generate a T cell-mediated immune response. A major defining characteristic of microbial lipids presented by CD1c is their significantly methylated alkyl tails [78, 108] and includes the mycobacterial lipids phosphomycoketide (PM) [76] and mannosyl-β1-phosphomycoketide (MPM) (Fig. 2) [78, 102]. Both lipid classes comprise a C32 methylated carbon tail and a phosphate polar headgroup, with the most significant difference between the two lipids being the addition of a mannose sugar at the β1 position of the phosphate ion in MPM [76, 120, 121]. Structural characterization of CD1c presenting both PM and MPM (Fig. 5) shows that the lipid tail is bound within the A′-pocket, and penetrates in the D′/E′ back portal of CD1c [120, 121]. Presentation of the polar headgroup differs between the two lipids, with the MPM headgroup being solvent exposed due to the presence of a C12 spacer lipid in the F′-pocket [120]. Studies performed on CD1c-restricted T cells that are specific for PM, MPM, or both lipids show that a mixture of CD4+, CD8+ and double negative T cells are able to recognize those Ags presented by CD1c. Although there is no available crystal structure of a TCR–CD1c–PM or MPM ternary complex yet, the CD1c mutagenesis study revealed that despite a shared TRBV7-9 gene among the different TCRs, the latter may adopt different docking strategy to recognize the CD1c molecule presenting the PM or MPM lipid-based Ags [121]. Aside from the mycobacterial lipid-based antigens, lipids presented by group 1 CD1 molecules have been identified in several other bacteria, including Salmonella typhimurium, Staphylococcus aureus, and Brucella melitensis [122]. CD1b conjugated to dextramers presenting whole cell lipid extract from each of the bacterial species demonstrated significant binding to polyclonal T cell populations, which were also found to stimulate cytokine production [122]. In each case, the identified immunodominant lipid-based Ag was phosphatidylglycerol (PG) (Fig. 2), which, while being highly abundant in these bacterial species, is also found within mammalian cells in very low amounts. The PG species identified in both bacterial and mammalian cells that induce an immune response all retain the same phosphoglycerol headgroup, yet varied in acyl tail saturation levels, with carbon lengths limited to lengths of C15–C18 [59]. In the case of PG presentation by CD1b, variations in lipid tail were not distinguished by the T cells [122], while in the case of PG presentation by CD1d the different forms were distinguished by NKT cells [59]. Whilst binary crystal structures of CD1b presenting self-phospholipids have been determined [66, 123], the structural characterization of CD1b presenting bacterial PG is yet to be conducted.

Molecular basis for the recognition of CD1b–GMM by GEM TCRs

A structural characterization at atomic level of TCR recognition of a group 1 CD1 molecule presenting microbial lipid antigen is currently limited to a single example [112]. For microbial and non-microbial lipid Ags, the molecular basis of group 1 CD1 molecule antigen presentation and their subsequent TCR recognition has been understudied and therefore limited information is available. Recently, the crystal structure of the GEM42 TCR–CD1b–GMM ternary complex (Fig. 6a and Table 2) provided the first fundamental insights into the molecular mechanism that underpins the recognition of the mycobacterial lipid GMM by the GEM TCRs [112] (Fig. 6a). Here, the structure of a GEM TCR, called GEM42, solved in complex with CD1b–GMM–C32 show how the high-affinity TCR “caged” the sugar moiety of the microbial lipid Ag. The solvent-exposed glucose moiety of the Ag is stabilized by the GEM42 CDR3 loops that form a “tweezers-like” structure around the carbohydrate (Fig. 6b). Interestingly, the CDR3 loops are positioned underneath the lipid headgroup, thus enabling the entire glucose moiety and a section of the acyl tail to be contacted by the TCR. This high shape complementarity between the lipid and the GEM42 TCR might be responsible for the high affinity exhibited by those TCRs. In stark contrast to the previously determined crystal structures of TCR–CD1d–microbial lipids, the GEM42–CD1b–GMM ternary crystal structure remarkably revealed that the lipid Ag contribution (27%) to the overall interface was more than double. In addition to the stabilization of the conformation of the lipid headgroup, upon GEM42 TCR binding, the cleft of CD1b was “bulldozed” by the TCR allowing it to have a better grip on the Ag (Fig. 6c). Now, it remains to be determined whether the LDN5-like TCRs will adopt distinct strategies to recognize the CD1b–GMM complex, resulting in a weaker affinity, and whether the observed “tweezers-like” mechanism is unique to the GEM TCRs.

Fig. 6
figure 6

Molecular recognition of CD1b–GMM by GEM TCR. a Overview of the GEM42 TCR–CD1b–GMM crystal structure shown as a cartoon representation. The α- and β-chains of GEM42 TCR are coloured in pink and blue, respectively. The CD1b and β2 m are coloured in pale orange and grey, respectively. GMM is represented as orange spheres and the spacer lipid as black spheres. b Close up view of the “tweezers-like” motif. The GEM42 TCR CDR3 loops, with CDR3α in pink and CDR3β in blue, surrounding the GMM represented as orange spheres. c Structural changes in the CD1b–GMM upon GEM42 TCR binding. The free CD1b–GMM is coloured in light blue, while the CD1b–GMM in complex with the GEM42 TCR is coloured in orange

Conclusions

While lipid-reactive T cells were first identified 25 years ago, understanding the general molecular basis of lipid antigen recognition and their role in human T cell immunity has lagged behind that of the MHC-restricted T cells. Indeed, CD1-restricted T cells have only recently emerged as central players in host protection. In the context of group 2 CD1, we have gained fundamental insights in recent years into the molecular basis that underpins the recognition of two main classes of microbial lipid-based Ags by NKT cells. However, the growing list of newly identified CD1d-presented microbial lipids offers opportunities to further investigate the mode of recognition of microbial lipids by NKT TCRs. In the context of the CD1 group 1 system, the lack of mouse-based models coupled with the difficulty in working with, and identifying, lipid-based Ags, has hampered progress in this exciting field. However, the development of the tetramers and dextramers technology has been key to the identification of group 1 CD1-restricted T cell subsets and thus enabled to further advance our understanding of the role played by group 1 CD1 molecules and restricted T cells in human antimicrobial immunity. Our understanding of the molecular basis that underpins the recognition of microbial lipid Ags by group 1 CD1-restricted TCRs is emerging with the first crystal structure of a TCR–CD1b–microbial lipid Ag recently determined. Finally, unlike the genetically diverse MHC molecules, CD1a, CD1b, CD1c, and CD1d proteins exhibit limited polymorphism, allowing CD1 proteins to be targeted pharmacologically with lipid ligands or small molecules. Therefore, the pursuit of our efforts to gain general molecular insights into their mode of binding to CD1 and TCRs offers exciting perspectives to design novel therapeutics to augment the protective immune response against microbial infections.