Introduction

Recent genetic studies have identified multiple Alzheimer’s disease (AD)-risk variants, including many that mark genes expressed specifically or selectively in microglia in the brain [1,2,3]. Because an understanding of the mechanisms whereby these variants impact AD may lead to pharmacologic agents that reduce AD risk, these studies have sparked intense interest in the role of inflammation in this disease. Several recent reviews have addressed these genetic factors and their actions in AD overall [4,5,6]. Here, we will focus on the genetics and actions of one of these risk factors, CD33.

CD33 genetics and AD risk

In 2011, two genome-wide association studies (GWAS) identified a single-nucleotide polymorphism (SNP) located upstream of CD33, rs3865444, as associated with AD risk [1, 2]. Naj et al., using AD Genetic Consortium samples, found a robust association between reduced AD risk and the minor allele of rs3865444 that reached genome-wide significance when meta-analyzed with subjects from the three other consortia that would later form the International Genomics of Alzheimer’s Project (IGAP) [2]. The complementary study by Hollingworth et al. was supportive of an association between rs3865444 and AD risk although the finding did not reach genome-wide significance (p = 2 × 10−4) [1]. When data from both studies were combined in a meta-analysis to generate an overall cohort of 18,762 AD individuals and 29,827 non-AD individuals, the association between rs3865444 and AD was strong, with a p value of 1.6 × 10−9 and an odds ratio (OR) of 0.91 [95% confidence interval (CI) of 0.88–0.93] [1, 2]. These studies focused attention on CD33. A prior AD genetic study had also identified a SNP near CD33, rs3826656, but this finding was not replicated in subsequent reports and rs3826656 is not in robust linkage disequilibrium (co-inherited) with rs3865444 [7]. In 2013, the IGAP studied an additional 8572 AD and 11,312 non-AD individuals, and, somewhat surprisingly, rs3865444 was not associated with AD (OR = 0.99, p = 0.69) [3]. The reason for the lack of association was not clear. The study was consistent with prior studies in terms of the cohort being European-based and the analytical approaches used [1, 2]. A meta-analysis of the overall dataset of 74,046 individuals found that rs3865444 just missed genome-wide significance with an OR of 0.94 (CI of 0.91–0.96) and p = 3.0 × 10−6 [3]. Similar findings for the rs3865444 association with AD were reported in a recent update to the IGAP study [OR = 0.92 (95% CI: 0.90–0.95)], p = 3.9 × 10−7 [8]. Quite recently, Jansen et al. reported on a very large-scale meta-analysis which incorporated an unconventional AD-by-proxy phenotype to gain statistical power. In this analysis, individuals that reported a parent with AD were considered to be AD-risk carriers and hence scored as AD. This study totaled 71,880 AD samples and 383,378 non-AD samples. Although subjects overlapped with IGAP, this study reconfirmed genome-wide significance for the rs3865444 association with AD (p = 6.3 × 10−9) [3, 9]. In summary, currently available, very large studies support an association between rs3865444 and AD risk. The reason for the inconsistent association in some cohorts could reflect statistical power, variation in AD diagnostic accuracy or cohort risk, or, conceivably, inconsistent linkage disequilibrium between rs3865444 and the functional CD33 SNP, or cohort variations in the allele frequency of SNPs in CD33-related genes that modulate the rs3865444 association with AD risk.

CD33 as a SIGLEC family member

Since CD33 is a member of the sialic acid-binding immunoglobulin-type lectin (SIGLEC) family, genetic variants in other family members could compensate for or enhance the protective association between the rs3865444 minor allele and AD. The human SIGLEC family consists of the CD33-related SIGLECs (CD33, SIGLEC 5–12, 14 and 16), which are rapidly evolving and poorly conserved between species, as well as four SIGLECs that are relatively well conserved between species (Sialoadhesin (SIGLEC1), CD22 (SIGLEC2), myelin-associated glycoprotein (SIGLEC4), and SIGLEC15) [10]. Examination of RNAseq results suggests that CD33-related SIGLECs are the primary SIGLECs expressed in microglia in the human brain (Fig. 1). Overall, SIGLEC10 and SIGLEC8 appear to be expressed at higher levels than CD33 while multiple CD33-related SIGLECs are expressed at lower levels. Hence, although only CD33 has been associated with AD risk, CD33 may not be the predominant SIGLEC in human microglia.

Fig. 1
figure 1

These data (mean ± SD, n = 3) are derived from http://web.stanford.edu/group/barres_lab/brainseq2/brainseq2.html [77]. Other microglial RNAseq studies report similar SIGLEC expression profiles [78]

SIGLEC expression in human microglia. SIGLECs with microglial expression greater than 1 FPKM (fragments per kilobase of transcript per million mapped reads) are shown. Additional CD33-related SIGLECs expressed in the human brain include SIGLEC7 (0.9 FPKM), SIGLEC16 (0.8 FPKM), SIGLEC11 (0.4 FPKM) and SIGLEC12 (0.2 FPKM). Non-CD33-related SIGLECs with microglial expression include SIGLEC1 (0.5 FPKM) and CD22 (0.2 FPKM). Two SIGLECs expressed predominantly in oligodendrocytes include myelin-associated glycoprotein (SIGLEC4) at 1.5 FPKM, and CD22 (SIGLEC2) at 3.7 FPKM (not shown).

SIGLEC family members share three primary hallmarks, suggesting the possibility of functional redundancy. The first hallmark is an amino-terminal IgV domain that mediates ligand binding (Fig. 2). As indicated by the SIGLEC acronym, the primary SIGLEC ligand is a sialic acid, which is a nine-carbon sugar abundantly found in the ganglioside subclass of glycosphingolipids and as a common post-translational protein modification. Sialylated glycoproteins and gangliosides are oftentimes especially abundant in pathophysiological conditions such as cancer and inflammation, including AD amyloid plaques; their binding to inhibitory microglial SIGLECs has been suggested to inhibit plaque clearance [11]. Within the IgV domain, an arginine that is critical for sialic acid binding is also well conserved in functional SIGLECs, being present at position 119 in CD33 [12].

Fig. 2
figure 2

Structures of SIGLEC family members expressed in human brain microglia. SIGLECs expressed at > 1 FPKM in microglia are depicted. The IgV (V) domain is amino-terminal while the IgC2 (C) domains are proximal to the membrane. While most brain SIGLECs contain an ITIM (IT) or ITIM-like domain (ITL), SIGLEC14 lacks these motifs and instead has a lysine in the transmembrane domain that is used to signal to DAP12 [25]

The second SIGLEC hallmark is one or more conserved IgC2 domains, which provide a foundation for the ligand-binding domain and may modulate ligand binding. The variable number of IgC2 domains has been proposed to impact whether the SIGLEC binds sialic acids in cis, i.e., neighboring glycoproteins on the same cell, or in trans, i.e., sialic acid ligands in the extracellular matrix or on adjacent cells, by varying the position of the IgV-binding domain closer or further from the cell membrane [13]. The third SIGLEC hallmark is an immunomodulatory motif. The CD33-related SIGLECs generally contain a cytosolic immunoreceptor tyrosine-based inhibitory motif (ITIM) proximal to the membrane and an ITIM-like sequence near their carboxyl terminus. Of the CD33-related SIGLECs, SIGLEC14 and SIGLEC16 lack this domain and instead have a positively charged amino acid in their transmembrane domain (Fig. 2). This charged amino acid is critical for signaling to adaptor proteins, such as DAP12, which results in the activation of DAP12’s immunoreceptor tyrosine-based activation motif (ITAM).

Considering that CD33 is not the most abundant SIGLEC in microglia, and that SIGLECs show some functional redundancy [14], the finding that CD33 is the only SIGLEC with a SNP that is associated with AD at a genome-wide level of significance has two interpretations. First, CD33 may modulate AD risk via a mechanism unique to CD33. Second, CD33 may be the only SIGLEC with a genetic variant that alters its function. Considering the latter, a SIGLEC9 polymorphism, rs2075803, has been associated with chronic obstructive pulmonary disease phenotypes including exacerbation frequency in a candidate gene study [15]. Although rs2075803 was not significantly associated with AD at genome-wide significance in the largest AD GWAS recently published by Jansen et al., this SNP was in summary data included in the supplement and had a modest association with AD (uncorrected p value p = 0.006) [9]. Interestingly, SIGLEC14 has undergone genetic deletion in 10–60% of humans, depending on race. This deletion has been associated with risk for infection [16, 17] but apparently has not been captured directly in genome-wide association studies. However, a recent GWAS of the human plasma proteome found that rs1106476 is strongly associated with reduced SIGLEC14 plasma levels (p = 2.2 × 10−309) and this SNP is also nominally associated with an increase in AD risk (p = 0.028) [9]. This correlation is consistent with the concept that a decrease in a SIGLEC that acts through an ITAM will increase AD risk. In summary, further work is needed to discern whether CD33 has unique properties, or if a robust targeted genetic evaluation of SIGLECs and AD risk will reveal additional SIGLEC variants that are consistently and significantly associated with AD.

A prevailing concept is that SIGLECs are activated by “self-associated molecular patterns,” or SAMPs, to signal through their ITIM to induce immunosuppression ([18], reviewed in [19, 20]). Briefly, SAMPs including specific glycosylation patterns, i.e., sialic acid linkages, provide a self-recognition signal to suppress innate immune cells, much in the way that “pathogen-associated molecular patterns” (PAMPs) or “disease-associated molecular patterns” (DAMPs) activate innate immune cells. This contact-dependent inhibition is thought to provide self–non-self discrimination in cells which do not undergo a selection process. Not surprisingly, pathogens have evolved to mimic SAMPs by expressing or presenting on their surface SIGLEC ligands and thereby suppress the immune response. For example, sialic acids on group B Streptococcus bind to Siglec9 on neutrophils to inhibit their activation and thereby increase bacterial survival [21]. Perhaps in response to these pathogen ligands, the CD33-related SIGLEC family members have undergone rapid evolution ([14, 22], reviewed in [23]). This SIGLEC evolution is reflected by several observations. First, the gene family has proliferated, with some 15 SIGLEC members in humans, 11 of which are CD33-related SIGLECs. Second, inter-species comparisons show evolutionary divergence, noting that CD33-related SIGLECs in particular show primary sequence divergence, exon shuffling, and differences in gene number, e.g., mice have nine SIGLECs, five of which are CD33 related [22, 24]. Third, some SIGLECs have been neutralized by conversion of their ITIM to an ITAM, by loss of the IgV arginine that is critical for sialic acid ligand binding or by gene deletion, e.g., humans lack SIGLEC13 which is present in chimpanzees [22]. SIGLEC neutralization through loss of the critical arginine is found more commonly in non-human primates than in humans [14]. SIGLEC neutralization by conversion to an immune activator is shown robustly by considering SIGLEC5 and SIGLEC14 ([25], Fig. 2). These two SIGLECs are adjacent on chromosome 19 and represent a recent gene duplication. Their amino-terminal IgV domains are nearly identical, suggesting that these two SIGLECs bind similar ligands. However, the carboxyl termini of SIGLEC5 and SIGLEC14 differ dramatically. SIGLEC5 signals via its ITIM to immunosuppress while SIGLEC14 signals via an intramembranous arginine to DAP12, causing an immune activation via its ITAM [25]. In situations where the two are co-expressed such as in myeloid cells, SIGLEC14 appears to counteract the effects of SIGLEC5 on immune function [25]. SIGLEC11 and SIGLEC16 reflect a similar gene duplication wherein the ITIM of SIGLEC11 is countered by the ITAM of SIGLEC16 [26]. Another type of evolutionary divergence is suggested by comparing human CD33 and murine Cd33. Murine Cd33 lacks the typical ITIM proximal to the membrane, retaining only the ITIM-like domain near the carboxyl terminus. However, Cd33 has evolved to include a lysine within its transmembrane domain that is reminiscent of SIGLEC14 and its use of DAP12 to activate immune cells. This suggests that murine Cd33 may not accurately reflect the actions of human CD33 such that a CD33 transgenic mouse may be necessary to capture the impact of CD33 in vivo. Overall, these findings are consistent with SIGLECs serving to modulate innate immunity, leading to evolutionary pressure that results in inter-species variability in CD33 function as well as SIGLEC function in general [27, 28].

SIGLEC ITIM signaling

Most brain SIGLECs, including CD33, have an ITIM proximal to the membrane, a 6–14-aa spacer, and then an ITIM-like domain near their carboxyl terminus (Fig. 2). The relevant CD33 sequences, relative to the prototypic ITIM consensus sequence [29], are shown in Fig. 3. Antibodies, used as surrogate ligands, induce SIGLEC clustering, Src family kinase recruitment, and subsequent phosphorylation of the tyrosine in the membrane proximal ITIM followed by the membrane distal ITIM-like domain [30,31,32]. This clustering likely occurs in lipid rafts, where Src family kinases tend to localize due to their fatty acid lipid anchor [30, 32, 33]. Phosphorylation of the SIGLEC ITIM creates a binding site for the SH2-containing tyrosine phosphatases SHP-1 and SHP-2 [12, 30]. The downstream targets of these phosphatases are multiple and include Syk, as well as scaffolding proteins such as BLNK that serve as a bridge between Syk and downstream signaling pathways [12, 34, 35]. CD33 ITIM phosphorylation also attracts proteins to promote CD33 degradation. Two members of the E3 ligase protein family, the suppressor of cytokine signaling, SOCS3, as well as Cbl, have been implicated in ubiquitination, internalization, and subsequent degradation of activated CD33 [31, 36]. SOCS proteins are upregulated by cytokines during inflammatory responses, suggesting the possibility that inflammation in AD may lead to lower CD33 levels. In summary, ligand binding to CD33 promotes receptor clustering, ITIM phosphorylation, localized tyrosine phosphatase activation, and CD33 internalization and eventual degradation. The overall impact is typically a reduction in immune cell activation ([37, 38] reviewed in [39]). Of note, synthetic high-affinity SIGLEC ligands have also been shown to induce inhibitory signaling via specific SIGLECs, including CD33 [40].

Fig. 3
figure 3

CD33 ITIM and ITIM-like sequences aligned to consensus ITIM sequence. The CD33 ITIM and ITIM-like sequences (underlined) are shown relative to the ITIM consensus sequence of S/L/V/IxYxxL/V/I. Note that the CD33 ITIM-like sequence begins with a threonine, as opposed to the consequence S/L/V/I, and hence is relatively well conserved with the consensus. The spacer between the ITIM and ITIM-like sequence in CD33 is 12 amino acids

Microglial homeostasis: balance of ITIM and ITAM actions

Microglial function appears to be regulated by a homeostatic balance between ITIM and ITAM signaling (reviewed in [41]). DAP12 is a prototypic ITAM family member and serves as a co-receptor for multiple cell surface receptors, including SIGLEC14 [25], and TREM2, which has been robustly associated with AD risk by genetics (reviewed in [41,42,43,44]). As noted above, signaling from the receptor to DAP12 is mediated by a positively charged amino acid in the receptor transmembrane domain. In a process similar to SIGLEC activation, receptor clustering leads to DAP12 clustering, which attracts a Src kinase family member to phosphorylate the tyrosines within the DAP12 ITAM. The ITAM consensus sequence is similar to that of two ITIMs, bridged by a spacer domain, i.e., D/ExxYxxL/V/Ix6–12xYxxL/V/I [45,46,47,48,49]. Robust high-avidity TREM2 activators, such as TREM2 antibodies, induce phosphorylation of both tyrosines in the DAP12 ITAM. This in turn attracts Syk, which has two SH2 domains. Engagement of both Syk SH2 domains with the two phosphorylated tyrosines in the DAP12 ITAM causes the amino-terminal Syk SH2 domain to move away from its inhibitory position in the kinase, thereby activating Syk. Hence, Syk is activated at sites of DAP12 clustering, leading to the recruitment of additional proteins, including scaffolding proteins such as BLNK and downstream effector enzymes including PI3-kinase and PLC-γ (reviewed in [42,43,44]).

Interestingly, less robust, lower avidity TREM2 ligands result in phosphorylation of a single tyrosine within the ITAM. This then attracts SHP-1 and/or SHIP1, both of which bind the ITAM via their SH2 domains, behaving as though the ITAM was an ITIM [50,51,52,53]. As noted above, SHP-1 is a phosphatase that inhibits Syk signaling. SHIP1, which is also linked to AD risk by genetics [3, 9], is a lipid phosphatase, hydrolyzing the 5′ phosphate from phosphatidylinositol (3,4,5)-trisphosphate. Hence, SHIP1 inhibits PI3 kinase, which is downstream in the DAP12 ITAM pathway [52]. In summary, low-avidity ligands in an ITAM pathway lead to immune suppression by causing ITAM monophosphorylation, resulting in ITIM-like activity by attracting SHP-1 and/or SHIP1. High-avidity ligands induce immune activation by causing phosphorylation of both tyrosines within the ITAM to attract Syk. The concept of immune regulation by ITIM/ITAM counterbalancing in this manner has been demonstrated in some systems (reviewed in [49, 54]) but has not yet been demonstrated directly in microglia.

Molecular genetics of CD33 in AD

Acknowledging that CD33 is a single member of a multi-protein homeostatic cellular mechanism, we turn to consideration of the molecular mechanism whereby CD33 modulates AD risk. The primary AD GWAS SNP for CD33 is rs3865444, which is 372 bp upstream of the CD33 transcription start site. The minor allele of this SNP is associated with reduced AD risk [1, 2, 9]. Griciuc et al. did not detect an association between rs3865444 and CD33 mRNA expression and yet found reduced CD33 protein in rs3865444 minor allele carriers [55]. Subsequently, our laboratory found total CD33 mRNA was increased about 25% in AD and decreased modestly with the rs3865444 minor allele [56] (Table 1). Moreover, we found that a surprisingly common CD33 isoform in human brain was lacking exon 2 (D2-CD33, aka CD33ΔV-Ig [55] and CD33m [32]). Skipping of exon 2 deletes the CD33 IgV domain, leading to an in-frame fusion of the CD33 signal peptide directly to the IgC2 extracellular domain ([32], Fig. 4). When we quantified D2-CD33 in comparison with CD33, using isoform-specific qPCR primers, we observed a striking correlation between rs3865444 and the proportion of CD33 mRNA expressed as D2-CD33 (Fig. 5, Table 1, [56]). We hypothesized that rs3865444 was a proxy for a functional SNP because rs3865444 is in the promoter region of CD33. Indeed, sequencing and subsequent genotyping studies found an exon 2 SNP, rs12459419, that is in perfect linkage disequilibrium with rs3865444. Minigene studies confirmed that rs12459419 is a functional SNP with the minor allele increasing the proportion of D2-CD33 [56]. The finding that rs3865444 and its functional proxy, rs12459419, are associated with CD33 exon 2 splicing efficiency was subsequently confirmed in several reports [57,58,59]. We also identified a third CD33 isoform that retains intron 1 (R1-CD33) and found that this isoform also increased with the minor allele of rs3865444 [59]. Retention of the 62-bp intron 1 leads to a codon reading frameshift such that this isoform encodes only the signal peptide.

Table 1 CD33 isoform expression as a function of rs3865444 genotype [56, 59]
Fig. 4
figure 4

Structure of CD33 gene, common CD33 mRNA isoforms, and their predicted protein domains. The sites of genetic variants relevant to this review are depicted on the CD33 exon structure. The modular nature of SIGLEC exons and protein domains are reflected in the structures of full-length CD33, CD33 lacking exon 2 (D2-CD33), and a secreted IgV domain arising from a 4-bp indel in exon 3 (noted by carot). This diagram is not drawn to scale, e.g., full-length CD33 protein is 364 amino acids, exon 2 encodes 127 amino acids and exon 3 encodes 93 amino acids. The cytosolic tail consists of 82 amino acids. Alternative splicing produces an atypical exon 7 that is less abundant and does not include ITIM motifs (not depicted)

Fig. 5
figure 5

rs3865444 allele dose-dependent association with CD33 exon 2 splicing efficiency. C is the major rs3865444 allele and A the minor allele. Each point represents the qPCR result from a different human brain RNA sample. Figure redrawn from Malik et al., with permission [56]

The association between rs12459419 and CD33 exon 2 splicing has also been documented by acute myeloid leukemia (AML) researchers, who are interested in CD33 as a target for antibody-based AML therapeutics [60, 61]. In fact, several, although not all studies, found a correlation between the rs12459419 major allele and increased efficacy of gemtuzumab–ozogamicin, which recognizes an epitope in the CD33 IgV domain (Table 2, [60,61,62] reviewed in [63]). Interestingly, Mortland et al. reported that patients homozygous for the minor allele of rs12459419 were more likely to have a favorable risk disease than those with the major allele (52% vs. 31%, p = 0.034) [60]. The minor allele homozygous individuals also had significantly lower diagnostic blast CD33 expression than other genotypes (p < 0.001) [60], suggestive that either lower CD33 or increased D2-CD33 may also lessen AML severity.

Table 2 CD33 antibodies and their epitopes

The function(s) of D2-CD33, if any, have not been determined. Since the loss of exon 2 does not alter the codon reading frame, D2-CD33 encodes a protein essentially identical to CD33 except for the loss of the IgV domain. Therefore, we and others originally hypothesized that D2-CD33 is non-functional because loss of the IgV domain would result in an inability to bind sialic acid-based ligands (reviewed in [5, 39]). Although humans and chimpanzees express both CD33 and D2-CD33, the allelic regulation of exon 2 splicing appears human specific and has been hypothesized to reflect evolutionary pressure for longevity in humans [66].

Several studies have described D2-CD33 as a cell surface protein. For example, Perez-Oliva et al. transfected HEK293T cells with D2-CD33 and found expression on the cell surface by flow cytometry and immunofluorescence [32]. Our laboratory independently confirmed this finding in transfected HEK293T cells [59] as did Laszlo et al. [64]. CD33 is thought to normally exist on the cell surface as a homodimer ([67], Fig. 6). The dimer is not stabilized by disulfide bonds but rather, based on X-ray crystallography models, appears to be stabilized by IgC2 domain interactions as there are no disulfide bonds between the two monomers. The interpretation that dimerization is mediated by the IgC2 domain suggests the possibility of D2-CD33 homodimers as well as CD33 and D2-CD33 heterodimers (Fig. 6). Since D2-CD33 lacks the ligand-binding domain, a heterodimer would be predicted to have lower ability to bind ligand but a similar ability to signal given the retention of both cytosolic domains. However, the feasibility of heterodimers at least is unclear and may be unlikely because the ectopic expression of D2-CD33 did not influence internalization of full-length CD33 induced by the CD33 antibody p67.6, which recognizes the IgV domain [64]. If a portion of CD33 was engaged in a heterodimer with D2-CD33, this would have been predicted to decrease CD33 internalization. Overall, discernment of the role, if any, of dimerization in CD33 or D2-CD33 function and AD impact requires further experimentation.

Fig. 6
figure 6

CD33, D2-CD33 homo- and heterodimers, and IgV domain derived from CD33 indel. CD33 homodimer model is based on CD33 X-ray crystallography (PDB:5J06 (www.rcsb.org/structure/5J06) [79]). D2-CD33 and indel models are derived from this CD33 model. Extracellular domains are shown. There are two monomers: one depicted as pink/red and the other as light blue/blue. The IgV domains are shown as pink or light blue and the IgC2 domains as red or dark blue. The arginine critical for sialic acid ligand binding is marked in green. Disulfide bonds are marked in yellow and circled in dotted orange. A CD33 monomer is stabilized by disulfide bonds within the IgV domain (aa41–101), within the IgC2 domain (aa163–212) and between the IgV and IgC2 domains (aa36–169, see dotted orange circles in CD33 homodimer). These three disulfide bonds are a common feature of all the SIGLECs [80]. D2-CD33 retains the disulfide bond within the IgC2 domain and has an unpaired cysteine that may be relevant to D2-CD33 gain of function. Conversely, the indel-secreted IgV domain retains the disulfide bond within the IgV domain and also has an unpaired cysteine

The finding that the minor allele of rs3865444 is associated with reduced full-length CD33 is essentially unequivocal because of its reproducibility in multiple labs and experimental models. The magnitude of the reduction has been somewhat variable between reports and tissue types. Our laboratory performed qPCR on human brain in an absolute fashion using reference standards and found that the mRNA encoding apparently full-length CD33 was reduced by ~ 44% in a comparison between major and minor allele homozygous individuals (Table 1, [59]). At the protein level, Griciuc et al. performed western blots of human brain using the PWS44 IgC2 antibody and found a 30–60% reduction in CD33 protein in carriers of the rs3865444 minor allele, depending on the normalization protein [55]. Walker et al. performed western blots on human brain with EPR4423, an antibody against the IgV domain, and reported an approximately threefold reduction in CD33 in a comparison of rs3865444 major and minor allele homozygous individuals [57]. Raj et al. examined peripheral blood mononuclear cells in western blots with H-110, a rabbit polyclonal antibody against the CD33 cytosolic tail, and reported a larger, 15-fold reduction in CD33 in minor allele homozygous individuals [58]. Bradshaw et al. evaluated human monocytes with flow cytometry involving an antibody raised against the CD33 protein (AC104.3E3) and reported a sevenfold reduction in CD33 mean fluorescence intensity between major and minor allele homozygous individuals [65]. Last, Mortland et al. used a similar approach and reported a threefold reduction in CD33 expression in AML blasts between homozygous major and minor allele individuals [60]. The actual CD33 epitopes recognized by the antibodies in the flow cytometry studies are not clear. Overall, we interpret variability in the quantitative impact of CD33 genetics on CD33 protein as reflecting differences in tissue type and/or assay techniques. In summary, the primary impact of rs3865444, and its functional proxy, rs12459419, appears to be an allele dose-dependent increase in D2-CD33 at the expense of full-length CD33 with secondary effects being a decrease in total CD33 and an increase in R1-CD33. Of particular importance, the genetics of AD risk show a similar allele dose-dependent pattern and increased protection against AD (Table 1).

CD33 genetic impact on microglial function

Given that CD33 contains an ITIM and is expressed on microglia, several groups have evaluated the impact of CD33 genetics on microglial function. Griciuc et al. quantified CD33 + microglia per unit area in human AD brain samples by immunohistochemistry and reported a decrease with the rs3865444 minor allele [55]. Since this study used the CD33 antibody PWS44, which recognizes an IgC2 domain present in both CD33 and D2-CD33, this decrease in CD33 + microglia is not due to lower CD33 expression per se. Bradshaw et al. reported a decrease in the frequency of stage III microglia/macrophages with rs3865444, consistent with a possible decrease in microglial activation with the rs3865444 minor allele [65]. The finding that levels of SIGLEC mRNAs, including Cd33, are decreased in disease-associated murine microglia, while TREM2 mRNA is increased and is required to achieve this microglial subtype, is supportive of the concept that SIGLECs and TREM2 generally modulate microglial activation in opposing directions [68]. Given the importance of innate immunity, it is somewhat surprising that rs3865444 and TREM2 polymorphisms are not associated with more human conditions. Besides AD, the primary GWAS finding for rs3865444 is a decrease with the minor allele of white blood cells, specifically those involving the myeloid lineage (p value = 6.81 × 10−14) [69]. Although microglia are derived from a separate lineage than monocytes in the periphery, whether this finding may be relevant to microglial abundance in brain is not clear. Perhaps, this decrease in peripheral myeloid cells may promote subclinical infections, e.g., periodontal disease, which is thought to be one of the many socioeconomic or medical contributing factors in AD [70, 71].

To evaluate CD33 impact on immune cell function, researchers have transfected microglial cell models with CD33 constructs and evaluated monocytes as a function of CD33 genotype. For example, the overexpression of human CD33, but not D2-CD33, in the murine microglial BV2 cell line inhibited Aβ42 uptake [55]. Similarly, Bradshaw et al. studied human monocytes and found a dose-dependent inhibition of uptake of dextran beads and Aß42 with increasing copies of the rs3865444 major allele [65]. These in vitro findings are consistent with CD33, but not D2-CD33, acting to inhibit microglial phagocytosis. This possibility is supported by in vivo findings that the rs3865444 minor allele is correlated with decreased formic acid-soluble Aβ42 in frontal cortex of AD brains as well as an apparent amyloid reduction in living individuals, as detected by Pittsburgh Compound-B (PIB) [55, 65]. In summary, the rs3865444 minor allele is associated with reduced CD33, increased D2-CD33, increased uptake of Aß and dextran uptake, decreased Aß burden in vivo and decreased AD risk.

Current findings in CD33 genetics and AD

In considering CD33 genetics and AD more broadly, we recently investigated the impact of a 4-bp insertion/deletion (indel), rs201074739, in exon 3 of CD33. The deletion is moderately rare with a ~ 2.4% minor allele frequency in European populations. The rs201074739 deletion causes a frameshift in the CD33 codon reading frame at amino acid 155. This results in aberrant amino acids at positions 156–159 and then a premature stop codon. Hence, instead of CD33 as a 364-aa type-1 transmembrane protein, CD33 containing this 4-bp deletion is predicted to be a secreted protein consisting of the IgV domain and approximately 16 amino acids of the IgC2 domain (Fig. 6). As such, this deletion would preclude both CD33 and D2-CD33. This was demonstrated in a very recent report by Papageorgiou et al. describing an individual that was homozygous for this deletion and had no detectable cell surface CD33 on their monocytes [72]. We investigated CD33 expression and splicing in several individuals heterozygous for the indel (Table 3). Although one may have hypothesized that loss of CD33 due to the indel would lead to a compensatory increase in CD33 expression, we found that CD33 mRNA levels appeared reduced in indel carriers. This decrease in apparent CD33 may reflect nonsense-mediated RNA decay which was demonstrated for this isoform in the recent report by Papageorgiou et al. [72]. We also considered the possibility that atypical splicing may generate an unusual CD33 isoform that retained ITIM function in these individuals. Indeed, we found a CD33 isoform unique to the indel carriers wherein a novel splice donor site was introduced a few bp after the rs201074739 deletion. This donor site was spliced into an acceptor site at the 24th bp of exon 6 rather than the beginning of exon 4. Overall, this isoform consisted of exons 1, 2, and 7, along with fragments of exons 3 and 6, and again encoded a truncated CD33 protein consisting of essentially the signal peptide, IgV domain and fragment of the IgC2 domain.

Table 3 CD33 expression appears decreased in indel carriers

Since the rs3865444 minor allele is associated with decreased full-length CD33 and decreased AD risk, we hypothesized that this indel would also be associated with reduced AD risk. To evaluate this possibility, we used results from the most recent IGAP AD meta-analysis [8]. As a positive control to assess whether CD33 genetics are associated with AD, we examined the CD33 exon 2 splicing SNP, rs12459419, and found a significant association with AD risk [OR = 0.92 (95% CI 0.90–0.95), p = 4.5 × 10−7]. This was a robust sample set of 21,982 AD cases and 41,944 non-AD cases. However, the 4-bp indel was not significantly associated with AD risk in these same data [p = 0.1337, OR = 0.90 (95% CI 0.79–1.03]. While post hoc power calculations have well-known limitations [73], it is somewhat revealing that a dataset of this size (> 60 k individuals) confers 80% statistical power to detect an odds ratio as small as 0.90 for a variant with a minor allele frequency similar to this indel (2.4%).

Current model for CD33 and AD

Based on our current understanding of CD33 genetics, AD risk and SNP actions, an interesting paradigm has emerged. The AD-associated SNP, rs3865444, acts through the linked SNP rs12459419 to primarily increase aberrant exon 2 splicing. This results in a D2-CD33 increase at the expense of CD33. Reduced functional CD33 was hypothesized to mediate reduced AD risk. This loss of function hypothesis is derived from the demonstrated ability of ligand binding to the CD33 IgV domain to cause CD33 clustering and ITIM phosphorylation and overall lead to an inhibition in phagocytic-type activity. A decrease in CD33 inhibition of microglial function was thought to increase microglial function, e.g., clearance of amyloid and cell debris and, over time, lead to reduced AD pathology. The conclusion in this loss of function hypothesis is that further reducing CD33 protein via pharmacologic means (antibodies, small molecules, anti-sense approaches) would reduce AD incidence and severity [39, 59].

This theory is now called into question because of the data regarding the rs201074739 indel variant. Although the indel is expected to cause a 50% reduction in CD33 in heterozygous individuals and to produce a complete loss of CD33 cell surface expression in homozygous individuals [72], this indel does not appear to modulate AD risk.

Considering CD33 genetic findings overall, we conclude that (1) an AD-protective splicing SNP, rs12459419, increases D2-CD33 and decreases CD33 but (2) an indel that robustly decreases CD33 has no effect on AD risk. While we recognize that future genetic studies may yield new findings, a parsimonious interpretation of available data is that AD protection may well be mediated by the increase in D2-CD33, i.e., D2-CD33 represents a gain of function variant to protect from AD.

In considering possible mechanisms for a D2-CD33 gain of function, we note that the rs3865444 minor allele is associated with an increase in D2-CD33 [56, 57, 65] as well as decreased amyloid burden [57, 65], and increased Aß42 uptake [65]. Although we initially attributed these actions to a decrease in CD33 ITIM signaling, they are also consistent with ITAM signaling. In considering this possibility further, we note that the sequence of the CD33 ITIM and ITIM-like region represents a near-canonical ITAM sequence (Fig. 7) [45,46,47, 49]. A clear precedent for ITIMs acting as an ITAM can be found with PECAM-1 which has an ITIM and an ITIM-like sequence similar to that of CD33. Although PECAM-1 signaling is generally thought to be immunosuppressive, PECAM-1 has also been shown to bind and activate Syk when the two tyrosines in the PECAM-1 “ITAM” sequence are phosphorylated [74]. Other studies have also suggested that “ITAM” sequences similar to those in CD33 can bind Syk as well as ZAP70 and lead to cellular activation [48, 75, 76]. Hence, the gain of function whereby D2-CD33 reduces AD risk may be due to D2-CD33 acting as an ITAM to activate microglial function (Fig. 8). We speculate that this gain of function in D2-CD33 but not CD33 is due to the smaller extracellular domain of D2-CD33. The loss of the IgV domain, which represents about 1/3 of the total CD33 protein, may allow D2-CD33 to cluster in a novel conformation and facilitate dual ITAM motif phosphorylation. This is analogous to the way TREM2 switches from an immune inhibitor, in response to a low-avidity ligand, to an immune activator, in response to a high-avidity ligand, in a process mediated by DAP12 monophosphorylation and dual phosphorylation, respectively. D2-CD33, which lacks any ligand-binding capability, may constitutively function as an ITAM. In summary, the hypothesis that D2-CD33 can act as an ITAM and bind Syk to generate an activating signal is experimentally testable and a current focus of on-going work.

Fig. 7
figure 7

Comparison of CD33 ITIM and ITIM-like region with consensus ITAM sequence. The consensus ITAM sequence is D/ExxYxxL//I x6–12YxxL/V/I. Similar variations with the ITAM consensus sequence have been reported for functional ITAMs previously [45,46,47, 49]. ITAM sequences with longer spacers have also been reported to bind and activate Syk [74, 81]

Fig. 8
figure 8

Working model of D2-CD33 neuroprotection in AD. Full-length CD33 recruits phosphatases such as SHP-1 and SHP-2. Targets of SHP-1 include Syk and scaffolding proteins such as the B-cell linker protein BLNK. SHP-1 dephosphorylates these proteins, thus acting as a negative regulator for microglial activation. Syk activation leads to multiple downstream cellular events, including calcium mobilization via phospholipase C γ activity, protein kinase B (AKT) signaling, and extracellular-related kinase (MAPK/ERK) signaling. These events lead to metabolic and transcriptional changes, along with context-dependent increases in activities such as phagocytosis and chemotaxis. D2-CD33 may be locked in a constitutively active or permissive conformation, leading to Syk recruitment at the membrane, and ultimately resulting in a higher propensity toward microglial activation through mechanisms similar to the well-recognized functions of TREM2 and its co-receptor, DAP12

Finally, the presence of the unpaired cysteine in the IgC2 domain of D2-CD33 may also contribute to a gain of function. D2-CD33 homodimers may possibly form a disulfide bond between the two IgC2 domains. If this occurs, it is unclear whether this would stabilize D2-CD33 or lead to increased clearance. This might determine the proportion of D2-CD33 homodimer versus heterodimer formation in the endoplasmic reticulum. Whether a D2-CD33 gain of function exists, and whether it is due to D2-CD33 monomer, homodimers, or D2-CD33/CD33 heterodimers has yet to be experimentally shown.

Conclusions

Three findings are critical to the evaluation of CD33 as an AD modulator. First, the minor allele of rs3865444 is associated with reduced AD risk in the largest AD GWAS performed to date. Second, rs12459419, which is in near-perfect linkage disequilibrium with rs3865444, modulates the efficiency of CD33 exon 2 splicing. In combination, these two results provide very strong evidence that a decrease in CD33 or an increase in D2-CD33 reduces AD risk. Third, rs201074739, a 4-bp indel that precludes CD33, does not appear associated with AD risk. Overall, we interpret these findings as reducing the likelihood that a decrease in CD33 protects from AD. Rather, we currently hypothesize that an increase in D2-CD33 protects from AD. Studies to confirm the lack of indel association with AD and to determine the mechanism whereby D2-CD33 protects from AD are on-going. The insights gained through this process will hopefully lead to pharmacologic agents that maximize this protective pathway to reduce AD risk.