Introduction

What disease will a person get and when? How severe will it be and how rapidly will it progress? Who will or will not respond to specific treatments? What is the initial trigger that starts the process of Alzheimer’s disease (AD) and neurodegeneration and what are underlying mechanisms? Where can we intervene to stop or prevent AD? These are medically and scientifically relevant questions that we could answer better with a more complete understanding of the genetics of both rare and common forms of AD and other neurodegenerative diseases.

For AD, early work between 1988 and 1995 identified three genes that harbor mutations that cause the disease. The term “cause” is used because individuals with a mutation in one of these genes inevitably develop AD, often at a remarkable young age. While mutations in these genes are rare causes of AD, these initial discoveries had a major impact on our understanding of molecular mechanisms and led to multiple therapeutic approaches that are currently being tested. In the early 1990 s, the technology and knowledge were available to identify causative genes, but not to identify systematically genes that just contribute to an individual’s risk of developing AD. In the past 5 years, advances in our knowledge of the human genome, the technology to interrogate the genome, and methods to analyze genetic data have made it possible to identify genes that contribute to risk. In this manuscript, we review early genetic findings and what these initial gene discovery efforts taught us, and then move to more recent findings on genetic risk factors for AD. The implications of recent findings are unknown and the focus of intense interest.

In addition to an overview of the genetic causes and risks for AD, we also review the relationship between these mutations and genetic variants and the neuropathologic features of AD and related disorders. AD has two core neuropathologic features: the formation of various forms of amyloid deposits often referred to as senile plaques and neurofibrillary degeneration that is typified by neurofibrillary tangles, neuropil threads, and dystrophic neurites in neuritic plaques (Fig. 1). Other changes that are characteristic of AD include cerebral amyloid angiopathy (CAA), brain regional atrophy and neuron loss, and glial activation. Still other neuropathologic changes may be observed in some but not all cases of AD; examples include Lewy body disease (LBD), vascular brain injury, hippocampal sclerosis, and TDP-43 inclusion formation. Since these last neuropathologic changes also are observed in other clinical contexts, they often are considered co-morbid, especially in the elderly; however, their co-occurrence in younger individuals with autosomal dominant forms of AD at least suggests that some may share common pathogenic mechanisms with AD.

Fig. 1
figure 1

Photomicrograph of neuritic plaque and neurofibrillary tangle. Histologic section of neocortex from a patient who died of dementia that has been stained with a modified Bielschowsky method to reveal the hallmark lesions of AD: a large extracellular plaque with amyloid core and abnormal neuritic processes (neuritic plaque) and a skein of darkly staining threads that encircle a neuron’s nucleus and extend down a process (neurofibrillary tangle)

Autosomal dominant Alzheimer’s disease

Families with large numbers of early-onset AD (onset <60 years) were reported as early as the 1930s. Because AD with this early-onset age is rare, the cause of AD in these families was clearly inherited in a simple Mendelian autosomal dominant pattern, meaning children of an affected parent were at 50 % risk for developing early-onset AD. It was not until the mid-1980s that methods became available to identify the responsible genes. In relatively rapid succession, three autosomal dominant genes were identified: amyloid precursor protein (APP), presenilin 1 (PSEN1), and presenilin 2 (PSEN2). While not all early-onset AD is caused by inheritance of mutations in one of these three genes, it is unlikely that another autosomal dominant causative gene for AD is yet to be discovered.

APP

Identification of the APP gene was made possible by the seminal work of Glenner and Wong [36] who, in 1984, determined a partial amino acid sequence of a peptide isolated from AD brain cerebrovascular amyloid. This partial sequence and additional sequence determined by others [107] were subsequently used by Kang et al. [68] to clone APP (chromosome 21). The predicted protein, designated APP, is a ubiquitously expressed, single-pass transmembrane protein that contains an internal 39–43 amino acid sequence corresponding to the Aβ peptide isolated by Glenner and Wong. Aβ peptides are generated from APP (Fig. 2) by two endoproteolytic cleavages catalyzed by β and γ secretases; α secretase catalyzes endoproteolysis of APP within the Aβ sequence [120]. Aβ peptides are the primary component of amyloid deposits that form in brain parenchyma (called “plaques”) and in the walls of cerebral vessels (called “cerebral amyloid angiopathy”, CAA); both are key features of AD neuropathologic changes.

Fig. 2
figure 2

APP protein structure and processing. The APP protein is a single-pass transmembrane protein with an extracellular globular domain (top of the figure). The extracellular segment is illustrated here in blue. The transmembrane segment is composed in part of sequence that makes up the Aβ peptide (red). The amino acid sequence of Aβ peptides and flanking regions are shown at the bottom along with the position and amino acid changes of different APP mutations. Locations of the α-, β-, and γ-secretase sites are shown. The γ-secretase site is shown with a red triangle because cleavage at this site can leave different C-terminal amino acids

APP mutations

Progress in AD genetics began with the identification of autosomal dominant mutations in APP [37]. To date, 24 APP single nucleotide mutations are known that cause AD (http://www.molgen.ua.ac.be/ADmutations). While APP can encode multiple isoforms with the longest being 750 amino acids, all AD mutations are clustered within a 54 amino acid segment, either adjacent to or within the sequence that encodes Aβ peptides. It is instructive to review how different mutations inform us about the molecular pathogenesis of AD.

The Swedish mutation is a double substitution changing the two amino acids immediately before the beginning of the Aβ peptide sequence. The normal sequence, lysine-methionine is replaced by asparagine-leucine. The functional consequence is that the amount of Aβ peptide produced from the mutant APP is two to threefold higher than that produced from non-mutated APP, presumably by affecting the efficiency of β-secretase cleavage. Since the Swedish mutation is outside the sequence encoding Aβ peptides, the amino acid sequence of the Aβ peptide produced from this double substitution is normal. The implication is that excess Aβ production is sufficient to cause AD. This conclusion is supported by the long-standing observation that individuals with trisomy 21 (Down syndrome; DS) develop extensive AD neuropathologic changes [173] that can occur very early in life [90]. By age 40, virtually all individuals with DS have full expression of AD neuropathologic features. Dementia is also common in individuals with DS with a prevalence of 15 % after age 45 that increases to 75 % after age 65 [20, 163]. Finally, consistent with the implication that an extra copy of APP is sufficient to cause AD neuropathologic changes, duplications of small segments of chromosome 21 that do not cause DS but that contain an additional copy of APP are sufficient to cause AD [69, 136, 146].

Other APP mutations that cause AD cluster at or after the C-terminal amino acids of Aβ peptides, including the most common mutation, p.Val717Ile (Fig. 2). These mutations alter the activity of γ-secretase cleavage. When proteolysis of normal APP is catalyzed by γ-secretase, the predominant species formed is 40 amino acids long (Aβ40) with smaller amounts of Aβ that is 42 amino acids long (Aβ42). APP mutations at the C-terminal end of Aβ shift proteolysis to produce more Aβ42 at the expense of Aβ40, resulting in an increased Aβ42/Aβ40 ratio but not necessarily a change in the total amount of Aβ peptides formed [8, 138]. Aβ42 is more amyloidogenic and more prone to aggregate than shorter forms of Aβ [62]. These were among the first data to suggest that Aβ aggregation is key to AD pathogenesis.

Another mutation of note is p.Glu693Gly [66] also known as the Arctic mutation [118]. This missense mutation does not alter the amount of Aβ formed nor does it alter the γ-secretase cleavage site or the Aβ40/Aβ42 ratio. However, this amino acid change within Aβ does increase the aggregation rate of the mutant peptide, adding to the evidence that APP mutations can result in Aβ peptides with altered aggregation properties and that this process is critical to AD pathogenesis.

APP mutations and the neuropathology of individuals with three copies of this gene provide conclusive evidence that an alteration in either the amount of Aβ produced or the amino acid sequence of Aβ is sufficient to cause AD and strongly support the proposal that a multimeric form of Aβ is critical for AD pathogenesis.

Neuropathologic changes observed with APP mutations

Several studies have demonstrated variation in the core neuropathologic features of AD among patients with mutations in APP. At the time of autopsy, patients with APP mutations tend to have greater amounts of neocortical senile plaques than patients who had “sporadic” AD, although these data come from relatively few cases and there is broad overlap [57, 61, 104]; there is no clear trend for a difference in neurofibrillary degeneration [96, 153, 162]. The ratio of accumulated Aβ42/Aβ40 may be higher in cases of autosomal dominant AD from APP mutations, but again there are a limited number of examples and wide variation in results [116, 156158].

In addition to variation in the amount of core neuropathologic features of AD, APP mutations also are associated with morphologic variants. Several APP mutations have been associated with variation in the structure of Aβ deposits. These include the p.Ala692Gly (Flemish) APP mutation that is associated with large, dense plaques [23, 77] and the p.Glu693Gly (Arctic) APP mutation with ring-like plaques [5]. CAA is a common feature of AD caused by mutations in APP. The p.Asp694Asn (Iowa) and p.Ala713Thr mutations are associated with exceptionally severe CAA [39, 135] in the context of the core neuropathologic features of AD. The p.Glu693Gln (Dutch) APP mutation produces CAA without significant accumulation of the core features of AD; in these individuals, progressive cognitive impairment is linked to vascular brain injury [11, 117].

Several cases of AD caused by mutations in APP also have LBD, which may be observed by histochemical stains of pigmented brainstem neurons or demonstrated in more widespread brain regions, including limbic structures and neocortex, by α-synuclein immunoreactive Lewy bodies and Lewy neurites. Some investigators have demonstrated only brainstem LBD using histochemical methods [43] while others have demonstrated limbic LBD [41], and still others neocortical LBD [54, 82] in cases of AD caused by mutations in APP.

PSEN1 and PSEN2

Mutations in either of these two genes cause early-onset autosomal dominant AD. Over 180 mutations in PSEN1 are known to cause autosomal dominant AD with onset as early as the late 1920s (http://www.molgen.ua.ac.be/ADmutations). Penetrance of PSEN1 mutations is complete by age 60–65 years, meaning all mutation carriers develop early-onset AD. Fewer than 15 known mutations in PSEN2 also can cause early-onset autosomal dominant AD, but penetrance is more variable than PSEN1. In the Volga German kindreds, the families that facilitated identification of PSEN2 mutations, onset ranges from the early 1940s to the 7th and 8th decade of life [91].

Mechanism

PSEN1 and PSEN2 encode closely related proteins that are part of the γ-secretase complex (Fig. 3). Either presenilin 1 or 2, together with nicastrin, APH1, and PEN2, form a complex that catalyze the cleave a number of different membrane proteins [169] at sites embedded in the lipid bilayer. In the case of APP, the presenilin-containing γ-secretase catalyzes endoproteolysis at the C-terminal end of the Aβ peptide sequence. This proteolytic event, along with a second cleavage at the N-terminal end of this sequence, is required for production of Aβ peptide from APP.

Fig. 3
figure 3

The γ-secretase complex. The complex is composed of four membrane proteins: nicastrin, aph-1, Pen-2, and presenilin (1 or 2). The actual catalytic site is in presenilin. The gap in presenilin represents cleavage of the protein that is part of normal processing. Cleavage of APP and other substrates occurs in the membrane. As illustrated here, when there is an APP mutation that causes AD or related disease, the cleavage site is altered by approximately two amino acids (short yellow segment at the end of the red Aβ peptide). While only a single PSEN mutation is shown in the figure, as noted in the text, there are numerous PSEN1 and PSEN2 mutations that cause early-onset AD

How do PSEN1 or 2 mutations cause AD? γ-Secretase containing mutation-altered presenilin still catalyzes cleavage of APP, but the proteolytic site is altered. Normal γ-secretase yields predominantly Aβ40 with smaller amounts of Aβ42. Mutant γ-secretase produces more Aβ42, a result consistently obtained across multiple studies. As noted above, Aβ42 is more amyloidogenic and more prone to aggregate than Aβ40 [62].

Neuropathologic changes observed with PSEN1 or PSEN2 mutations

As with APP mutations, individuals who died from AD caused by mutations in PSEN1 or PSEN2 tend to have a greater amount of neocortical senile plaques [57, 103, 105, 106] and a shift toward a higher Aβ42/Aβ40 ratio [53, 110, 156] than patients who had “sporadic” AD, although again these data derive from relatively few examples with wide variation even within the same family. The relationship between PSEN1 or PSEN2 mutations and neurofibrillary degeneration is more complex. Overall, there is no difference between the amount of neurofibrillary tangles present in AD resulting from PSEN1 or PSEN2 mutations and in “sporadic” AD [38, 96, 162]. One PSEN1 mutation, a substitution in exon 8 (leucine271valine) that results in transcripts lacking exon 8, has been associated with lack of neuritic change in senile plaques [78]. In contrast, increased accumulation of tau species other than neurofibrillary change has been reported in some cases caused by a PSEN1 mutation [144].

PSEN1 mutations also are associated with a morphologic variant in Aβ plaques: the so-called cotton wool plaque (Fig. 4). Cotton wool plaques occur in the same regions of brain as senile plaques and have been associated with multiple PSEN1 mutations. Cotton wool plaques typically have limited fluorescence following staining with thioflavin S and tend to be more immunoreactive for Aβ42 than for Aβ40 [151, 155, 165]. It is important to stress that although highly characteristic of PSEN1 mutation, cotton wool plaques also have been reported in apparently sporadic AD [84, 177].

Fig. 4
figure 4

Photomicrograph of cotton wool plaques. Histologic section of frontal cortex from a patient with a strong family history of AD, who inherited PSEN1 mutation, and died of dementia. Hematoxylin and eosin stain (×200) of cerebral cortex shows marked neuron loss, vacuolization, gliosis, and large amorphous eosinophilic structures that are called cotton wool plaques. Modified Bielschowsky stain (inset) further highlights cotton wool plaques

Phenotypic heterogeneity also occurs between PSEN1 and PSEN2 mutation carriers [8]. The clinical and neuropathologic variability could be related to the mechanism of action of different PSEN1 and 2 mutations, or comorbidity with other neurodegenerative disease conditions.

LBD also has been observed in individuals who inherited a mutation in PSEN1 or PSEN2 [67, 95, 131]. A recent detailed survey of LBD reported that virtually all of the 24 cases of AD caused by nine different mutations in PSEN1 harbored LBD in limbic structures; 11 also had diffuse LBD that also involved regions of neocortex [89]. The same study examined 14 cases of AD from a PSEN2 mutation (asparagine141isoleucine). These had less frequent limbic or neocortical LBD, but a similar frequency of brainstem LBD as the PSEN1 mutation cases. In some cases, diffuse LBD occurs in the context of the expected core neuropathologic features of AD [148], while in others there is some restriction of usual AD changes. For example, a three base pair deletion in exon 12 of PSEN1 was associated with diffuse LBD and increased Aβ accumulation in the form of cotton wool plaques, but not other forms of senile plaques [58].

A series of 11 cases of autosomal dominant AD from mutations in APP or PSEN1 had one example of TDP-43 inclusion formation; this case also showed hippocampal sclerosis [27]. TDP-43 neuronal inclusions are uncommon in cases of AD from the PSEN2 asparagine141isoleucine (Volga German) mutation and were limited to amygdala [63].

Other phenotypes associated with mutations in PSEN1, or other genes encoding components of γ-secretase

NCSTN encodes nicastrin, and PSENEN encodes PEN2; both also are components of the γ-secretase complex (Fig. 3). Mutations in PSEN1, NCSTN, or PSENEN cause a rare autosomal dominant chronic skin disease called acne inversa [170], also called hidradenitis suppurativa, which is characterized by follicular inclusion with secondary inflammation [132]. Three different types of mutations cause this disorder: single base deletions that result in frame shifts and premature stop codons, a mutation that alters a splice site and results in exon skipping and an altered protein, and a nonsense mutation. Each of these mutations is predicted to cause loss of function due to haploinsufficiency and reduced γ-secretase activity. Some work suggests that at least some PSEN1 AD-causing mutations also could be loss of function mutations [45, 142]; however, none of the elderly acne inversa mutation carriers developed early-onset AD, even though there were 15 subjects over age 50. Likewise, subjects with AD-causing PSEN1 mutations do not have severe acne.

Late-onset AD susceptibility genes

While the mutated genes that cause early-onset AD inform about molecular mechanisms, they account for only a fraction of AD cases. The majority of AD patients are late-onset with risk approximately doubling every 5 years after age 65. Thus understanding late-onset AD genetics is imperative if we are to illuminate the mechanisms of neurodegeneration in older individuals. Except for a handful of PSEN2 mutation carriers, late-onset AD very rarely results from highly penetrant dominantly inherited mutations [70]. Rather, evolving data indicate that multiple genes contribute to susceptibility or risk for late-onset AD.

Susceptibility genes are identified by genetic association studies in which allele frequencies for polymorphisms at or near a gene are compared between cases and controls. Susceptibility genes are revealed when case and control frequencies differ significantly. Early attempts to identify these types of susceptibility genes were driven by selecting candidate genes based on the existing knowledge of disease pathogenesis. Until about 2005, due to technical limitations, only small numbers of genes and modest populations could be evaluated. Over 1,000 candidate genes were tested for AD susceptibility. With few exceptions, most studies of this type could not be replicated likely because of inadequately evaluated population substructure, sample sizes that were too small to detect meaningful associations, and P value thresholds that were insufficiently stringent.

Recent advances now make it possible to evaluate essentially all genes and all regions between genes in a single experiment, a method called genome-wide association studies (GWAS). This method has led to an explosion in susceptibility gene discovery for a large number of diseases, disease-related traits, and associated phenotypes [123]. For brain diseases, successful GWAS results have been reported for AD [114], Parkinson’s disease, amyotrophic lateral sclerosis, progressive supranuclear palsy [48], and multiple sclerosis, to name a few.

Advances in a number of key technological and analytical areas have driven rapid progress in complex disorder genetics using GWAS. First, advances in genotyping platforms permit interrogation of most of the genome for disease-associated variation in a single experiment. Current genotyping arrays contain assays between 600,000 and 2.5 million single nucleotide polymorphisms (SNPs) and cost less than $500/subject. To accompany this cost-effective genotyping methodology, the HapMap Project and the more recent 1,000 Genomes Project identified over 20 million SNPs and provide key information on linkage disequilibrium (LD), or correlation, between neighboring SNPs. Understanding LD not only allows the construction of haplotypes, but also provides the ability to impute the genotypes of nearby unobserved (not genotyped) SNPs using directly observed genotypes. Imputing not only provides additional information, but also facilitates merging data from different genotyping platforms with incomplete overlap, an advance that makes initial selection of genotyping platform less critical.

Analytical advances also have been made in methods for evaluating population substructure and for matching cases and controls. Even when using approaches to avoid a mismatch between cases and controls, such as collecting ethnicity information, it is still possible to have cryptic population substructure in case–control samples. Now, principal component methods are routinely used in GWAS to match cases and controls, avoiding false-positive results. This is particularly important as we move toward methods that detect rare variants (see below). Other advances in quality-control methods for managing large data sets are now firmly established. These include standards for identifying problematic SNPs and poor quality samples that can yield unreliable results.

Two susceptibility genes for AD had been identified prior to large GWAS: APOE and SORL1. We will discuss these genes separately from those identified by GWAS methods.

APOE

This gene encodes apolipoprotein E (ApoE), a protein originally extensively studied for its role in lipid metabolism but now recognized to be a pleiotropic molecule with important (perhaps overlapping) actions in lipid transport, Aβ trafficking, synaptic function, immune regulation, and intracellular signaling [101, 102]. ApoE, like other apolipoproteins, is a protein component of lipoprotein particles that binds to the cell surface receptors. Humans are unlike other mammals in that we have three common alleles of APOE called ε2, ε3, and ε4 alleles that encode corresponding 299 amino acid-containing protein isoforms. Early work identified APOE genotypes as contributing to risk for heart disease [55, 100]. In 1991, Pericak-Vance and colleagues [124], using family-based methods, identified a genetic linkage between AD and a region of chromosome 19 that harbors APOE. Subsequent work showed that polymorphisms within the APOE gene were strongly associated with AD [152]. The polymorphisms showing association are the ε2, ε3, and ε4 alleles. These alleles represent haplotypes of two SNPs in the coding region of APOE. The ε2 allele encodes a cysteine (Cys) both at amino acid position 112 and at position 158. Allele ε3 encodes a Cys at 112 and an arginine (Arg) at 158, while the ε4 allele encodes an Arg both at 112 and at 158. The ε4 allele is the ancestral and high-risk form [100], the ε3 allele is the most common in humans and the neutral allele, and the ε2 allele is associated with decreased risk of AD. On a population level, people with an ε4 allele have lower average onset age and those with an ε2 allele have a higher average onset age for AD [21]. The AD risk associated with these alleles is additive. A person with a genotype of ε4/ε4 is at higher risk than someone with an ε3/ε4 or ε2/ε4 genotype. Likewise, ε2/ε2 genotype is more protective than genotypes where only one ε2 allele is inherited. This risk pattern has been replicated in many publications and in all ethnic groups studied [34]. While earlier work observed that amyloid plaques had ApoE immunoreactivity [115] and that APOE expression is up-regulated in activated astrocytes in AD brain [30], it was the genetic discovery that clearly linked ApoE isoforms to AD pathogenesis.

APOE genotypes also influence onset ages in carriers of PSEN1 [122] or PSEN2 [171] mutations. Less compelling evidence suggests that APOE genotypes also affect age of onset in subjects carrying APP mutations [150]. Several publications reported that SNPs other than the APOE ε2/ε3/ε4 alleles might influence risk by modifying the effects of these three alleles. Early work suggested that polymorphisms within the promoter of APOE influence risk, though these findings were not replicated in large data sets. A recent publication, using a modest sample size, suggests that a poly-T track in TOMM40, the gene adjacent to APOE (Fig. 5) affects AD onset age [134]. The poly-T track in TOMM40 is variable in length; long poly-T length alleles associated with the ε3 APOE allele were reported to have an earlier onset age than short poly-T track alleles. In subsequent work using substantially larger samples, this relationship between poly-T track length and onset age was not replicated, and when APOE genotypes were accounted for, there was no significant association to TOMM40 [18, 26].

Fig. 5
figure 5

APOE gene structure and the ε2/ε3/ε4 allele system. The top of the figure shows the relationship between APOE and adjacent genes TOMM40 and APOC1 (not drawn to scale). In the middle of the figure, APOE exon structure and location of the polymorphic nucleotides that are responsible for the APOE ε2/ε3/ε4 allele system are shown. At the bottom, nucleotide and the resulting amino acid changes corresponding to these alleles are shown

APOE is by far the strongest risk locus in GWAS of AD. It is observed as a significant signal even in GWAS with limited power [1, 6, 9, 40, 92]. Unfortunately, the SNPs that define the APOE ε2/ε3/ε4 alleles were not on the early genotyping arrays. These SNPs are present on more recent arrays but the genotypes are not reliable. Often, the results reported are for surrogate SNPs that do not capture the complete genetic information for the APOE tri-allelic system. Still, odds ratios (OR’s) for APOE from surrogate SNPs, ranging from 2.3 to 4.01 [15, 19, 141], are much larger than other AD risk loci that have been identified by GWAS (see below). Because the sample size used in these studies is very large (>10,000 cases and 10,000 controls), and because APOE has such a large effect, there are many SNPs with genome-wide significance (P < 5 × 10−8) in the APOE region and these can span over 400 kb [114]. Additional SNPs spanning a larger region show nominal significance. The signal from many, if not most, of these adjacent SNPs disappears when the analysis is conditioned on APOE ε2/ε3/ε4 genotypes. However, in one study of SNP rs597668, which is 296 kb from APOE (between BLOC1S3 and EXOC3L), adjusting for APOE genotypes still left a signal with an OR = 1.18, 95 % confidence interval (CI) of 1.08–1.24, and P = 3.9 × 10−4. This finding was not replicated in other GWAS [65, 172]. Additional work is needed to determine whether it is an independent AD risk factor, or it is due to extended LD between APOE and the BLOC1S3/EXOC3L region.

SORL1

The gene product of the SORL1 gene is a receptor that participates in trafficking vesicles from the cell surface to the Golgi-endoplasmic reticulum, a pathway thought to be important for APP processing and the generation of Aβ peptide. For this reason, Rogaeva and co-workers selected SORL1 and related genes as candidates for AD susceptibility loci [133]. In the original report, there was a complex association between AD and SNPs clustered at two locations within SORL1. As with many candidate gene associations, initial attempts to replicate these findings were mixed with both positive [10, 75, 76, 83, 86, 87, 109, 159] and negative [93, 98, 111, 139] findings. Not all of these studies had adequate sample size or sufficient power for replication.

A recent unbiased meta-analysis of all previous SORL1 association studies reported significant association between both clusters of SNPs and AD in Caucasians [129]. Only results from the 3’ end of the gene were significant from an Asian cohort. This meta-analysis used a large sample consisting of 11,592 cases and 17,048 controls for the Caucasian analysis and 872 cases and 881 controls for the Asian analysis. Additional support for an association between SORL1 variants and AD also comes from several GWAS [6, 114, 141], although the sample set partially overlaps with the SORL1 meta-analysis. Thus, there is strong evidence for SORL1 as an AD risk gene. Further evidence awaits meta-analysis of all GWAS from a combined meta-analysis of the four major AD genetics consortia. The Alzheimer’s Disease Genetics Consortium (ADGC), the Genetic and Environmental Risk in Alzheimer’s disease (GERAD) consortium, the European Alzheimer Disease Initiative (EADI) group, and the Cohort for Heart and Aging Research in Genomic Epidemiology (CHARGE) are undertaking this analysis as part of the International Alzheimer Genomic Project (IGAP) [52].

Unbiased GWAS of AD

The candidate gene approach was successful for identifying APOE and SORL1 as risk loci. However, over 1,000 other candidate genes for late-onset AD were reported, and most, if not all, were false positives. Initial GWAS of AD had limited power [1, 6, 9, 15, 19, 40, 92, 128], but two findings were clear. First, as a proof of principle, APOE was easily detected with P values ranging from 5.3 × 10−34 to 3.7 × 10−120. Second, there was no other locus of similar effect size to APOE. In two of these studies, loci reaching genome-wide significance were reported. Carrasquillo et al. [15] highlighted the X-linked gene PCDH11X with an OR = 1.30 (CI 1.18–1.43). Evidence was stronger in females than males using a sample of 2,356 cases and 2,384 controls. Reiman et al. [128], in a combined cohort of 861 cases and 550 controls, observed that SNPs in GAB2 reached genome-wide significance (9.66 × 10−11, OR = 4.06, CI = 2.81–14.68), but only in APOE ε4 carriers. In subsequent much larger GWAS, the signal at PCDH11X did not replicate and the signal for GAB2 was only nominally significant (P = 0.017) [7, 114].

In the past 3–4 years, well-powered GWAS were performed and new genes significantly associated with late-onset AD were discovered and replicated in independent samples. The advantages of this genome-wide approach are several fold: (1) gene discovery by this method is not predicated on previous knowledge about AD pathogenesis, and thus has the capacity to identify new genes and molecular pathways relevant to AD; (2) GWAS methods can be used with case–control populations that were collected in earlier studies, leveraging pre-existing resources; (3) rigorous quality control and statistical methods coupled with sufficient sample size can lead to high reproducibility; (4) methods are available so that investigators can recognize and account for cryptic population substructures, contributing to reproducibility. Disadvantages of this approach are: (1) signals are often ambiguous and may span multiple genes (see MS4A gene cluster described below and Fig. 5); (2) signals can be in intergenic regions making assessment of the functional relevance difficult; (3) genetic methods often cannot identify which single nucleotide variant is pathogenic; (4) most signals are from small effect loci.

It is worthwhile to pause on this last point of small effect size for risk associated with many GWAS-identified loci. The small effect size of most GWAS signals determines that each will add only incrementally risk assessment; nevertheless, small effect loci can be tremendously insightful by revealing relevant players in molecular pathogenesis and thereby highlighting potential therapeutic targets. An excellent example is the quantitative lipid trait on chromosome 1p13 where the signal detected initially spanned multiple genes and, when further refined, was in an intergenic regulatory element [161]. Additional studies identified SORT1 as the relevant gene, revealing a new pathway for VLDL secretion [112]. This work shows that despite the intergenic nature of the signal, the relevant gene can be identified, and despite the small contribution of genetic variation to the trait, the molecular mechanisms may be illuminated.

The following sections review recent AD GWAS work.

Clusterin (CLU), phosphatidylinositol binding clathrin Assembly protein (PICALM), and complement receptor type 1 (CR1)

The first GWAS to report loci that could be replicated were published in 2009 [44, 97]. The GERAD consortium [44] employed a two-stage design. Stage 1 consisted of 3,941 AD cases and 7,848 controls, of which 2,078 were screened, healthy elderly and the remaining controls were unscreened subjects whose mean age was younger than the age when AD risk begins. Genotypes for 529,205 SNPs, obtained using a commercial genotyping array, were analyzed for association with AD. In this stage, the most significant SNPs were close to APOE with the most significant result coming from a SNP in TOMM40 (P = 1.8 × 10−157, OR = 2.53, CI = 2.37–2.71). Note that the SNPs responsible for the APOE ε2/ε3/ε4 genotypes were not on the SNP array used, and these APOE SNPs are incompletely tagged by other SNPs in the region. Thus the fact that SNPs in TOMM40 are the SNPs yielding the lowest P value, most likely reflects the strong LD between TOMM40 SNPs and APOE genotypes, and not that TOMM40 is a better candidate for the signal in this region. Two other signals reached the threshold for genome-wide significance (P < 5 × 10−8), one in CLU (P = 1.4 × 10−9, OR = 0.84, CI = 0.79–0.89) and one near PICALM (P = 1.9 × 10−8, OR = 0.85, CI = 0.80–0.90). In stage 2, an additional 2,023 AD cases and 2,340 age-matched cognitively normal controls were genotyped and evidence supporting CLU and PICALM as AD risk genes was obtained.

Another AD GWAS by the EADI [80], published at the same time as GERAD consortium [112], used an independent set of samples also with a 2-stage approach for 2,025 cases and 5,328 controls in stage 1 and 3,803 cases and 3,097 controls in stage 2. Stage 1 yielded an association between late-onset AD and a SNP in CLU that approached genome-wide significance and marginal results for several other loci. Select SNPs were genotyped for the stage 2 cohort and supporting evidence obtained for CLU and a second site at CR1. The combined stage 1 + 2 analysis yielded significant results for CLU (P = 7.5 × 10−9, OR = 0.86, CI = 0.81–0.90), consistent with the findings of Harold et al. [112] for this locus. The signal at CR1 also reached genome-wide significance (P = 3.5 × 10−9, OR = 1.21, CI = 1.14–1.29) in the stage 1 + 2 analysis. Evidence supporting PICALM as an AD risk locus also was reported.

Results from these two GWAS demonstrated that reproducible genetic associations could be obtained with AD as long as sample sizes were sufficiently large and rigorous criteria for genome-wide significance were employed. The association with CR1, CLU, and PICALM has been confirmed by a number of investigators [65, 172], including one GWAS reviewed below that used a very large independent sample [114]. Also, these reports clearly demonstrate that there are no common variants with effect sizes comparable to APOE, and that remaining AD risk loci will be small effect genes.

Bridging integrator protein-1 (BIN1)

A subsequent manuscript, which included population-based cohorts from the CHARGE consortium, reported strong evidence that BIN1 is an AD risk locus [141]. In stage 1, incident cases and normal cohort subjects were analyzed along with an autopsy case–control cohort previously described by Reiman et al. [19, 128] and a second case–control group previously described by Carrasquillo et al. [15]. Stage 1 had 973 incident AD cases and 2,033 prevalent cases from both the population-based and the case–control studies. These were compared to 14,649 dementia-free controls. In stage 1, the only SNPs yielding significant results were those in the APOE region with the most significant SNP being again in TOMM40 (P = 3.18 × 10−68, OR = 2.23, CI = 2.04–2.44). SNPs with P values <10−3 were carried on to stage 2, which used samples from EADI [80]. SNPs with a P value <10−5 were carried on to stage 3, which used samples from GERAD [44]. Meta-analysis of all three stages showed genome-wide significance for two genes: BIN1 (P = 1.59 × 10−11, OR = 1.15, CI = 1.11–1.20) and exocyst complex component 3-like 2 (EXOC3L2; P = 6.45 × 10−9, OR = 1.17, CI = 1.11–1.23). Because EXOC3L2 is about 300 kb from APOE, conditional analysis was performed, using either the APOE ε2/ε3/ε4 or a surrogate SNP in TOMM40. For both analyses, an attenuated marginal signal remained that was not genome-wide significant. Additional work is needed to determine if a gene in the EXOC3L2 region is a risk factor independent of APOE (see below).

ABC transporter member 7 (ABCA7), membrane-spanning 4-domains, subfamily A, member 4 (MS4A4A)/membrane-spanning 4-domains, subfamily A, member 4E (MS4A4E)/membrane-spanning 4-domains, subfamily A, member 6A (MS4A6A), ephrin receptor EphA1 (EPHA1), CD33 antigen (CD33), and CD2-associated protein (CD2AP)

In 2011, two simultaneously published manuscripts described strong evidence for five new AD risk loci. The manuscript of Naj et al. [114] from the National Institute on Aging (NIA) AD Genetics Consortium (ADGC) reported a three stage design. Stage 1 consisted of 8,309 AD cases and 7,366 elderly screened normal. Some subjects (1,604 cases and 675 controls) were from NIA-funded Alzheimer’s Disease Centers (ADC’s) across the US that contributed material and data to the ADGC. The National Alzheimer Coordinating Center (NACC) collects the phenotypes for these subjects and the tissue/DNA is centralized at the National Cell Repository for Alzheimer’s Disease (NCRAD). The remaining subjects were from other published [176] and unpublished case–control and population-based studies. This stage yielded genome-wide significance for the previously associated loci APOE, CR1, and BIN1, and two novel loci: EPHA1 and MS4A4A/MS4A4E/MS4A6A. Loci with at P < 10−6 were carried on to stage 2 that added samples from the NIA ADCs and from several other case–control and population-based studies (3,531 cases, 3,565 controls). Combined analysis of stages 1 + 2 yielded genome-wide significance for two additional loci, PICALM and CLU, confirming previous results. The five novel loci that had suggestive or genome-wide significance were carried on to stage 3 where data were supplied from CHARGE, EADI, and GERAD consortia. Four of the five loci tested in stage 3 yielded genome-wide significance when all three stages were combined in a meta-analysis. These were CD2AP (P < 8.6 × 10−9, OR = 1.11, CI = 1.07–1.15), EPHA1 (P < 6.0 × 10−10, OR = 0.90, CI = 0.86–0.93), MS4A4A (P < 8.2 × 10−12, OR = 0.91, CI = 0.88–0.93), and CD33 (1.6 × 10−9, OR = 0.91, CI = 0.88–0.93). A fifth gene, ARID5B, did not replicate. This work also reported suggestive results for ABCA7 (5.0 × 10−7, OR = 1.15, CI = 1.09–1.21) with the peak signal at a non-synonymous SNP that is potentially the pathogenic variant (p.Gly1527Ala). These data were supplied to the CHARGE, EADI, and GERAD consortia to support evidence for this locus as an AD risk gene. As a summary of this work, the five previous AD risk loci (APOE, CLU, CR1, PICALM, and BIN1) were confirmed with genome-wide significance in a completely independent sample. Note that in this manuscript, and the one described in the following paragraph, investigators from all groups communicated to avoid using identical or overlapping datasets for the final analyses, an important precaution because some samples and datasets are broadly shared.

A second manuscript by Hollingworth et al. [50] reports the findings of the CHARGE, GERAD, and EADI consortia. Again a multi-stage approach was used. In stage 1, there were 6,688 cases and 13,685 controls from the GERAD group, consisting of both screened and un-screened control subjects. Genome-wide significance was obtained in stage 1 for MS4A6A and suggestive results for ABCA7. Select SNPs for these loci were tested in stage 2 (4,896 cases and 4,903 controls) and support was obtained for both. Select SNPs were carried on to stage 3, which used data from the EADI and CHARGE consortia and another case–control series. The combined stage 1 + 2+3 analysis produced results with genome-wide significance for ABCA7 (P = 5.0 × 10−21, OR = 1.23, CI = 1.17–1.28), and MS4A6A/MS4A4E (P = 1.2 × 10−16, OR = 0.91, CI = 0.88–0.93). In addition, in collaboration with the ADGC, the EADI, GERAD, CHARGE groups contributed data to support that CD33, EPHA1, and CD2AP are AD risk loci.

Other GWAS reports

A number of additional GWAS reports have been published that evaluate AD case versus controls as the primary phenotype [1, 85, 92, 113, 172], as well as other endophenotypes [17, 28, 51, 81, 140, 143]. Of the AD case versus control studies, the most noteworthy is the report of Naj et al. [113] who reported a signal (P = 1.9 × 10−10, OR = 2.1, CI = 1.67–2.64) at MTHFD1L, which encodes a component of the tetrahydrofolate synthesis pathway. In another study, SNPs highlighting CUGBP2 produced genome-wide significance in APOE ε4 homozygotes [172]. The findings from both of these studies will require independent replication.

Other AD-related phenotypes that have been used in GWAS include MRI and cognitive measures [140, 143], age-related cognitive decline [17, 28], psychosis [50], and age-at-onset [83]. While none of these studies produced genome-wide significant loci, these phenotypes could be important in defining the risk gene architecture of AD when larger samples are assembled.

Functional implications of late-onset AD genes

The early genetic studies of APP, PSEN1, and PSEN2 led not only to the establishment of Aβ peptide as a critical molecule in AD pathogenesis, but also to therapeutic approaches that are being tested currently. This success came despite the normal role of APP remaining unknown. Also, when initially described, PSEN1 and PSEN2 were essentially orphan sequences with only limited homology with proteins with known functions. In contrast, for most of the late-onset AD genes described above, the normal functions are at least partially known. However, the role of each in AD pathogenesis will require more investigation. A few themes emerge that can focus future work on specific pathways.

Lipid metabolism and APOE, SORL1, CLU, and ABCA7

When APOE was first identified as a late-onset AD susceptibility gene, much was already known about the function of ApoE in cholesterol transport [101]. In brain, ApoE is the main cholesterol transport protein. Despite being an extensively studied protein, the role of ApoE isoforms in AD pathogenesis remains enigmatic [12, 72]. ApoE may influence AD through its role in cholesterol transport, either in the periphery and/or in the brain. In the brain, cholesterol is needed to support synaptogenesis and maintenance of synaptic connections. Alternatively, ApoE might bind Aβ and act as a chaperone to influence the rate of Aβ fibrillogenesis. Another hypothesis is that ApoE lipid particles bind Aβ and clear the extracellular peptide by endocytosis and subsequent degradation. In each of these proposed mechanisms, the functional difference between the ε2-, ε3-, and ε4-encoded isoforms is the basis of risk associated with APOE.

SORL1 was selected as a candidate gene because it encodes a protein, SorLA, that is important in intracellular APP trafficking [133]. When cell surface APP is re-cycled via the endocytic pathway, SorLA directs APP to processing by presenilins to produce Aβ. This hypothesis is supported by studies of SorLA and APP processing in cell-based systems, in vitro studies, and in mouse models [3, 4, 46, 149]. Another function of SorLA related to AD is as a lipoprotein receptor that can bind ApoE [154].

Clu, also known as apolipoprotein J (ApoJ), is another lipoprotein found in both the periphery and the brain [119]. Like ApoE, Clu is involved in lipid transport in both the periphery and the brain where it is highly expressed [174]. Clu also is hypothesized to act as an extracellular chaperone that influences Aβ-aggregation and receptor-mediated Aβ clearance by endocytosis [64, 119]. Unlike APOE, there are no known coding variants that account for the observed genetic association to CLU, suggesting that genetic variation in expression levels may be responsible for altered risk for AD.

ABCA7 encodes a protein belonging to ABC superfamily of transporters. The normal role of the ABCA7 protein is only partially understood. It appears to function in the efflux of phospholipids and possibly cholesterol [74], although this function is not supported by recent work with ABCA7 knockout mice. Another role for this transporter is in phagocytosis. ABCA7 levels are regulated by sterol-responsive/regulatory element binding protein in response to cholesterol levels, which, in turn, promotes ABCA7 regulated phagocytosis [60]. While ABCA7 is clearly involved in lipid metabolism, how this protein is connected to AD will require additional work.

Innate and adaptive immunity

A number of genes identified in GWAS highlight the innate and adaptive immune response. One is CR1, a cell surface receptor that is part of the complement system and plays a role in regulating the complement response. CR1 has binding sites for complement factors C3b and C4b and participates in clearing immune-complexes containing these two proteins. Since Aβ oligomers can bind C3b, CR1 may participate in the clearance of Aβ. CR1 may also play a role in neuroinflammation, which is a prominent feature in AD [24, 147]. Interestingly, Clu may play a role in this process as an inhibitor [108]. Finally, in addition to their other roles in lipid and Aβ trafficking, ApoE isoforms differentially regulate the innate immune response in brain [71].

Another immune system-related gene identified by an AD GWAS is CD33. This gene encodes a protein that is a member of a family of cell surface immune receptors that bind extracellular sialylated glycans and signal via a cytoplasmic domain called the immunoreceptor tyrosine inhibitory motif [14, 25, 168]. CD33 has primarily been studied in peripheral immune system where it is expressed on myeloid progenitors and monocytes and also in the brain. In the periphery, CD33 appears to inhibit proliferation of myeloid cells [167]. The role of CD33 in brain is unknown.

The MS4A4A/MS4A4E/MS4A6E locus is significantly associated with AD. These genes are part of a cluster of 15 MS4A genes on chromosome 11 [56, 94] and encode proteins with multiple membrane-spanning domains that were initially identified by their homology to CD20, a B-lymphocyte cell surface molecule. Little is known about the function of MS4A4A gene products; however, like CD33, MS4A4A is expressed on myeloid cells and monocytes. Based on this expression pattern and the homology to CD20, MS4A4A and possibly other members of this protein family may have an immune-related function. The GWAS signal identifying the MS4A4A/MS4A4E/MS4A6E locus cannot distinguish which of these three genes is responsible for the AD risk at this location (Fig. 6). Note that the top SNP is in an intergenic region, and that SNPs with linkage disequilibrium of r 2 > 0.8 with the top SNP also span several genes.

Fig. 6
figure 6

Manhattan plot for AD association for the MS4A region of chromosome 11. The association plot is for P values determined by a three-stage meta-analysis as reported in Naj et al. [114]. The X axis is the genomic location (NCBI Build 37.1) and the y axis is the −log10 of the P value. The black diamond is the P value for the SNP with the smallest P value in the discovery phase (stage 1). The purple diamond is the P values for the same SNP for the first replication phase (stage 1 + 2). The red diamond is the P value for the same SNP at the second replication phase (stage 1 + 2 + 3). The colors of the small diamonds (orange, yellow, gray and white) indicate the estimated linkage disequilibrium with the top SNP and are r 2 ≥ 0.8, 0.8 > r 2 ≥ 0.5, 0.5 > r 2 > 0.2, and r 2 < 0.2, respectively. Genes in the region are indicated by arrows at the bottom with the direction of the arrow indicating the direction of transcription and the length proportional to the size of the gene

Cell adhesion and EPHA1

This gene, implicated in AD risk, encodes a member of the ephrin family of receptor protein-tyrosine kinases [47, 121, 125]. This group of cell surface receptors interacts with ephrin ligands on adjacent cells to modulate cell adhesion. In brain development, these receptors are important in establishing appropriate connections between different brain regions and act by modulating cell adhesion, migration, and axon guidance. Ephrin receptors also are involved in synapse formation and in plasticity [79]. Another brain development function of ephrin receptors is the regulation of apoptosis of neural progenitor cells [29]. These receptors also function in other systems including epithelial branching morphogenesis, the blood and lymphatic vascular systems, and the peripheral inflammatory response [59].

While there is a substantial body of research on the function of ephrin receptors, less is known about the EPHA1 gene product. Like other ephrin receptors, it regulates cell morphology and motility [175]. Early work implicated this receptor in regulating vascular morphogenesis and angiogenesis [2]. Knockout of the mouse gene results in abnormal tail and reproductive tract development [32], but no defects were detected in other tissues including the brain. In mouse, expression is restricted to epithelial tissue; no expression was detected in the brain or spinal cord. In humans, EPHA1 is expressed by CD4-positive T lymphocytes [49], monocytes [137], intestinal epithelium, and colon [42]. There is no work on EPHA1 and neurodegenerative diseases. The lack of evidence for brain expression and the expression by lymphocytes and monocytes suggests that, like CD33, CR1, and MS4A4/MS4A6E (discussed above), the role of the EPHA1 gene product in AD is mediated though the immune system.

Endocytosis pathways: BIN1, PICALM, and CD2AP

Several genes revealed in GWAS encode proteins involved in intracellular vesicle trafficking. Bin1 belongs to a family of proteins with BAR (Bin1/amphiphysin/RVS167) domains. These proteins are membrane adaptors that form complexes that act in clathrin-mediated endocytosis [31, 126, 127, 130]. The functions of Bin1 may be complex judging by the numerous splice variants that can be produced by this gene. PICALM also is involved in clathrin-mediated endocytosis and recruits clathrin and adaptor protein complex 2 to sites of vesicle assembly [160, 164]. How these endocytic genes are involved in AD pathogenesis is unknown.

The CD2 associated protein gene (CD2AP) encodes a scaffolding protein that binds directly to actin [88], nephrin, a cell–cell adhesion protein [145], and other proteins involved in cytoskeletal organization. This protein is implicated in dynamic actin remodeling during cytokinesis and endocytosis and has been extensively studied in kidney podocytes where it is a component of the slit diaphragm intercellular junction complex [99]. Haploinsufficiency of this protein causes focal segmental glomerulosclerosis [73]. In the immune system, CD2AP is required for synapse formation [33] in a process that involves clathrin-dependent actin polymerization [13]. While expressed in brain, the normal function of CD2AP in the CNS and its role in AD have not been studied.

Clinical implications of AD genetics findings

Genetic findings, once unambiguously identified, have the potential to predict an individual’s disease risk, phenotype, and possibly progression. They also may contribute to clinical management by prompting heightened surveillance, providing indication for preventive intervention, and guiding treatment selection. While there is still a long way to go in our understanding of the genetics of AD, our current knowledge about APP and PSEN1 mutations clearly shows that carriers will develop early-onset AD usually before the age of 60 years. PSEN2 mutations have a broader range of onset ages, limiting but not abrogating risk assessment. Since there are presently no preventive therapies for AD, detection of these autosomal dominant mutations is mostly used medically for making reproductive decisions. In the hopefully near future, when disease suppressing interventions become available, identifying carriers with one of these autosomal dominant mutations might be followed by biomarker or neuroimaging surveillance to determine when disease prevention or disease modifying treatment should begin.

The AD susceptibility genes, APOE, SORL1, and the genes recently identified by GWAS, have mixed utility for disease risk assessment. APOE ε4 is by far the strongest risk factor with the contributions by the other loci being minor by comparison. Of particular interest in terms of predictive testing is the APOE ε4/ε4 genotype. The lifetime risk for AD in individuals with this genotype is high, estimated as 33 or 32 % for men or women by the age of 75 years; by age 85 years, the risk climbs to 52 % for men and 68 % for women [35]. As recently pointed out by Genin et al., this is comparable to the lifetime risk by age 70 years for carcinoma of the breast in women with a BRCA1 mutation [16]. Fortunately, only 2 % of the (Caucasian) population has the APOE ε4/ε4 genotype. Considering the delayed penetrance of “sporadic” AD, lack of preventive therapies, and the potential for psychological harm, genetic testing for APOE is not recommended. However, when AD prevention becomes possible, then this recommendation will need to be reconsidered.

Summary of AD genetics

Tables 1 and 2 summarize findings of AD genetics studies. Several important conclusions can be drawn from these studies. First, APP, PSEN1, and PSEN2 genes are unambiguously established as autosomal dominant causes of AD, with evidence coming from numerous mutations in multiple families with early-onset AD. Second, remarkably, the evidence for the late-onset AD risk loci is the replicable across independent data sets using cohorts ascertained by a variety of approaches and cases diagnosed in numerous research centers. Thus, not only are the results robust, but also the diagnostic and analytic procedures are used across studies. Third, APOE is by far the largest effect locus for late-onset AD. Note that the ORs in Table 1 are from surrogate SNPs that do not completely capture the APOE ε2/ε3/ε4 genotypes. A better estimate of effect size comes from studies where the SNPs responsible for the ε2/ε3/ε4 alleles are directly genotyped and different genotypes considered separately. For example, in Farrer et al. [34], ORs (for Caucasians) are 1.0 for ε3/ε3 (referent genotype), 0.6–0.9 for ε2/ε2, 0.6 for ε2/ε3, 1.2–2.6 for ε2/ε4, 2.7–3.2 for ε3/ε4, and 12.5–14.9 for ε4/ε4. Fourth, it is highly likely that additional small effect risk loci will be identified for late-onset AD. This will require larger sample sizes. To achieve this, an international effort called the International Genomics Alzheimer’s Project (IGAP) is underway with participation of the ADGC, CHARGE, GERAD, and EADI consortia [52].

Table 1 Autosomal dominant AD genes
Table 2 AD susceptibility genes

The remarkable progress in AD genetics that started in 1992 has led to candidate therapeutics currently under evaluation in clinical trials. Second-generation Aβ immunotherapies are now being evaluated in clinical trials, as are γ-secretase inhibitors. While no ApoE-based therapies are in clinical trials, recent work [22] suggests that bexarotene, a compound that induces APOE expression, may have therapeutic potential. The time from gene discovery to candidate therapy ready for testing in clinical trials is discouragingly long, and it is, therefore, not surprising that no therapies yet have been developed from the 10 AD risk genes found over the past 3 years. Also, not every gene discovered will have a product that is approachable therapeutically. Nevertheless, there are examples where GWAS identified small effect risk loci are highly effective drug targets. One quantitative trait GWAS of lipid levels [161], e.g., identified two risk genes, HMGCoA reductase and NPC1L1, that are the target of simvastatin and Zetia (ezetimibe), both drugs used to lower cholesterol and prevent heart disease. The genetic variation in these genes had a small effect on population cholesterol levels, yet the drugs that target the products of these genes are arguably the most important preventive medications in use today. Other diseases for which GWAS hits revealed important drug targets include autoimmune disorders such as multiple sclerosis, ankylosing spondylitis and rheumatoid arthritis [166]. These examples illustrate not only that GWAS can identify potential therapeutic targets, but also that the effect size observed is GWAS is not an indicator of the biologic or medical significance of the gene’s product.

Despite the remarkable progress over the last 3 years, additional approaches need to be applied to completely resolve AD genetics. First, endophenotypes including cerebrospinal fluid biomarker data, quantitative neuropathology measures, and the phenotypes described above need to be used in studies where the sample size is sufficiently large to detect additional AD risk loci. Second, massively parallel DNA sequencing methods need to be used to detect rare large effect alleles in genes already identified by GWAS methods as well as variants that modulate susceptibility to AD in only a small fraction of late-onset cases. Third, genome-wide evaluation of insertion and deletion polymorphisms (indels) and copy-number variation (CNV) needs to be performed. Currently, SNP arrays can be used to detect CNVs that span greater than ~ 5 kb. However, shorter CNVs or indels cannot routinely be detected and genome-wide scans of this type of genomic variation await large-scale whole genome sequencing methods applied to AD.

The new wave of genetic discoveries in AD is just beginning.