Keywords

Early-Onset alzheimer’s Disease and Late-Onset alzheimer’s Disease: Causative and Susceptibility Genes

Multiple genetic defects involving either predictive (mutational) or susceptibility (risk) genes have been linked to the development of Alzheimer’s disease (AD) [13]. Rare (1–2 % of all AD cases) and fully penetrant disease-causing mutations in three different genes (APP, amyloid-beta protein precursor; PSEN1, presenilin 1; and PSEN2, presenilin 2) lead to early-onset (patients younger than 65 years) Mendelian (familial) forms of AD (EOAD). Of note, mutations in these three genes explain disease in only about 13 % of patients with EOAD [4]. The vast majority of AD cases, the so-called sporadic AD with no apparent familial recurrence, are defined by onset age later than 65 years or late-onset AD (LOAD), and this LOAD form does not carry Mendelian-causing mutations but is believed to be the result of multiple risk genes which do not reliably cause the disease but increase an individual’s susceptibility or predisposition to developing AD. Susceptibility genes are associated with the risk of LOAD, but each one contributes only a small amount to the risk. Twin studies predicted the heritability of LOAD to be as high as 80 % [5]. Susceptibility genes are identified by genetic association studies in which allele frequency for single-nucleotide polymorphisms (SNPs) at or near a gene is compared between AD cases and controls. Susceptibility genes are revealed when case and control frequencies differ significantly. There appears to be no overlap between the genes driving Mendelian versus non-Mendelian form of the disease; that is, common SNPs in APP, PSEN1, and PSEN2 do not seem to contribute to risk for LOAD [6].

Smaller-Scale Genetic Association Studies: The Candidate Gene Approach

Apolipoprotein E

The candidate gene approach was successful for identifying the e4 allele of the apolipoprotein E (APOE) gene on chromosome 19q13, as the only gene variant considered to be an “established” LOAD risk factor [7]. Unlike the EOAD mutations that are fully penetrant, APOE ε4 allele is a genetic risk factor that is neither necessary nor sufficient for the development of LOAD [1, 3, 4, 8]. Whereas only 24–30 % of the general Caucasian population carries at least one APOE ε4 allele, 40–65 % of LOAD patients have at least one copy of this allele. That is, many LOAD patients have no APOE ε4 allele, and individuals carrying this allele may never develop LOAD, suggesting that there are additional factors modulating the influence of the APOE e4 allele in causing the development of LOAD. People with one APOE ε4 allele have a roughly four times increased risk of AD, and those with two APOE ε4 alleles have a roughly 15 times increased risk, compared with those with the most common genotype APOE ε3/3 [4]. Although there is evidence of a risk effect of APOE ε4 in non-Europeans, the estimated effect sizes are smaller with less consistent results in African American and Hispanic subjects, which may suggest different underlying genetic or environmental factors for these ethnic groups. The effect of APOE ε4 appears to be age dependent: the lifetime risk for LOAD in individuals with the APOE ε4/4 genotype is high, estimated as 33 % for men or 32 % for women by the age of 75 years; by age 85 years, the risk climbs to 52 % for men and 68 % for women [9]. This very high-risk estimates for APOE ε4 carriers seem similar to those associated with autosomal dominant Mendelian genes. Therefore, APOE has been proposed as a moderately penetrant gene with semidominant inheritance: not all APOE ε4 carriers develop disease (hence, the ε4 allele in this gene is not fully penetrant), and heterozygous APOE ε4 carriers have intermediate risk compared with homozygous carriers [9]. Considering the delayed penetrance of LOAD, lack of preventive therapies, and the potential for psychological harm, genetic testing for APOE is not recommended. However, when LOAD prevention becomes possible, thus, this recommendation will need to be reconsidered, and genetic testing might be indicated for either high-risk groups (e.g., family members of LOAD cases) or for population screening.

Inconsistent Replication: Meta-analyses in the AlzGene Database

Since the original report of APOE as a genetic risk factor for LOAD in 1993, hundreds of genes have been tested for association with LOAD and reported in the literature. Most candidate gene association studies in LOAD have studied a few variants in only one or two genes, and despite positive initial results, inconsistent replication of original association findings has been the rule rather than the exception (except for APOE) and even for candidate genes with convincing functional data and thorough genetic assessment [1]. Multiple testing, population stratification, genotyping errors, and initial small sample size are potential reasons for false-positive findings in the original study. In addition, underpowered studies that are too small to detect a modest effect size can lead to false-negative follow-up studies. Candidate gene association studies have revealed modest estimated effect sizes with odds ratios (ORs) of less than 2.0 for risk alleles or greater than 0.5 for protective alleles. It is estimated that thousands to tens of thousands of subjects are required to have sufficient power to detect such effect sizes, a prerequisite that has typically not been fulfilled for many association studies in LOAD [10]. To address these very large numbers of conflicting results, a database (the AlzGene database) was created which systematically collected, summarized, and meta-analyzed the results for all the genetic variants studied in association with LOAD [11]. As of 18 April 2011, the 10 genes with the strongest signals for association in the database AlzGene included APOE and nine other candidate genes (BIN1, CLU, ABCA7, CR1, PICALM, MS4A6A, MS4A4E, CD33, and CD2AP), all of which came from genome-wide association studies.

Larger-Scale Genetic Association Studies: The Genome-Wide Approach

The HapMap Project and the 1000 Genomes Project

Instead of studying one or two genetic variants, recent advances now make it possible to evaluate essentially all genes and all regions between genes in a single experiment, a method called genome-wide association study (GWAS). The GWAS method represented an important advance compared to “candidate gene” studies in which sample sizes were generally smaller and the variants assayed were limited to a selected few associations that were difficult to replicate. The International HapMap Project [12, 13] launched in October 2002 led to the generation of a database of the common variants (defined as minor allele frequency of greater than 5 %) and the underlying linkage disequilibrium (LD) structure, or correlation between neighboring SNPs, providing the foundation for the GWAS. GWAS uses tagging SNPs, for example, polymorphisms in LD with each other, and this means that if one knows the genotype in one locus, one can predict with a high accuracy (dependent on the strength of the LD and the allele frequencies) the genotype occurring at linked loci [14]. Understanding LD not only allows the construction of haplotypes but also provides the ability to impute the genotypes of nearby unobserved (not genotyped) SNPs using directly observed genotypes. Imputing facilitates merging data from different genotyping platforms with incomplete overlap [10]. Until 2010, GWAS studies had almost exclusively employed the HapMap data set as the reference panel for imputation of their genetic data, which contained up to two to three million SNPs. Using genome-wide sequencing with high-throughput platforms, the 1000 Genomes Project Consortium [15] described the location, allele frequency, and local haplotype structure of approximately 15 million SNPs, 1 million short insertions and deletions, and 20,000 structural variants. Over 95 % of the currently accessible variants found in any individual are present in this data set. From 2010 onward, the 1000 Genomes Project has increased power of GWAS to detect genetic influences due to less common variants. Rigorous quality control and statistical methods coupled with sufficient sample size can lead to high reproducibility of GWAS. Disadvantages of GWAS are that signals can be in intergenic regions making assessment of the functional relevance difficult, genetic methods often cannot identify which single-nucleotide variant is pathogenic, and most signals are from small effect loci [14].

International Consortia: Meta-analyses of GWAS

A consensus has emerged that a P value less than 5 × 10−8 corresponds to genome-wide significance in a non-African population-based GWAS. This is a conservative Bonferroni correction based on roughly one million “effectively independent” common SNPs throughout the genome. This involves the risk of rejecting biologically valid hypotheses on purely statistical grounds, that is, false negatives. Therefore, statistical power is the main threat to GWAS, necessitating the formation of large international consortia that can provide sufficient number of cases and controls. The four largest LOAD GWAS consortia are the European Alzheimer’s Disease Initiative (EADI) based in France, the US-based Alzheimer’s Disease Genetics Consortium (ADGC), the Genetic and Environmental Risk in Alzheimer’s Disease (GERARD) group from the UK, and the neurology subgroup of the multinational Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) consortium. The first two GWAS were published in 2009 by the GERARD [16] and EADI [17] consortia. In approximately 6,000 LOAD and 10,000 control subjects, in addition to APOE-related SNPs that revealed genome-wide significance (P = 4.9 × 10−37 to 1.8 × 10−157), the GERARD consortium found that rs11136000 in clusterin (CLU, P = 8.5 × 10−10, OR = 0.86) and rs3851179 in the phosphatidylinositol-binding clathrin assembly protein (PICALM, P = 1.3 × 10−9, OR = 0.86) were significantly associated with LOAD. Analyzing 6,000 LOAD and more than 8,000 control subjects from EADI consortium, rs11136000 in CLU and rs6656401 in complement component receptor 1 (CR1, P = 3.7 × 10−9, OR = 1.21) were significantly associated with LOAD. In 2010, in more than 35,000 persons, the CHARGE consortium reported strong evidence that rs744373 near bridging integrator 1 gene (BIN1, P = 1.59 × 10−11, OR = 1.13) was significantly associated with LOAD [18]. In 2011, two simultaneously published manuscripts reported meta-analyses of the findings of the ADGC, CHARGE, GERARD, and EADI consortia and described strong evidence for five new LOAD risk loci. In nearly 20,000 cases and 40,000 controls, Hollingworth et al. [19] described association with LOAD of rs3764650 in ABCA7 (P = 5.0 × 10−21, OR = 1.23), rs610932 in MS4A6A (P = 1.2 × 10−16, OR = 0.91), rs9349407 in CD2AP (P = 8.6 × 10−9, OR = 1.11), rs11767557 in EPHA1 (P = 6.0 × 10−10, OR = 0.90), and rs3865444 in CD33 (P = 1.6 × 10−9, OR = 0.91). In approximately 19,000 cases and 29,000 controls, Naj et al. [20] confirmed that common variants at MS4A gene cluster, CD2AP, CD33, and EPHA1 were associated with LOAD.

International Genomics of Alzheimer’s Project: Mega-meta-analysis of GWAS

The four LOAD GWAS consortia have joined forces, forming a mega-consortium known as the International Genomics of Alzheimer’s Project (IGAP). The project drew on data from a total of 74,000 people of European ancestry (25,500 LOAD and 48,500 unaffected controls) and conducted a mega-meta-analysis, working with more than 11 million SNPs with a very dense coverage of the genomic map [21]. Table 4.1 depicts the list of genes and variants associated with LOAD in this mega-meta-analysis: in addition to the already eight known GWAS-defined genes (ABCA7, BIN1, CLU, CR1, CD2AP, EPHA1, MS4A4, and PICALM) that have been confirmed (CD33 gene did not reach here genome-wide significance), 11 new susceptibility loci have been identified in or near plausible candidate genes (CASS4, CELF1, FERMT2, HLA-DRB5/DRB1, INPP5D, MEF2C, NME8, PTK2B, SLC24A4/RIN3, SORL1, and ZCWPW1). The effects of all these 19 genes on risk for LOAD are exceedingly small (Table 4.1), with allelic ORs between 0.77 (SORL1) and 1.22 (BIN1); in contrast, the ORs for APOE ε4 are approximately 4 or 15 for one or two ε4 alleles, respectively. That is, one or two copies of the APOE ε4 allele increases the risk for APOE by more than 400 % or 1500 %, whereas one copy of all these non-APOE alleles merely increases or decreases the risk by approximately 30 %, at most. However, the findings from this mega-meta-analysis are, for the most part, not based on the true susceptibility variants but are reflective of their tagging markers, which may harbor greater heterogeneity than the former with respect to alleles and extent of LD. Thus, it remains a possibility that the actual functional susceptibility variants may have bigger effect sizes.

Table 4.1 LOAD-associated GWAS loci in the International Genomics of Alzheimer’s Project mega-meta-analysis

Population Attributable Fraction: Understudied Populations

The cumulative population attributable fraction (e.g., the proportion of LOAD cases in a population that would be prevented if an exposure were eliminated) at each of the 19 non-APOE loci identified by the IGAP (Table 4.1) was between 1.1 % (CASS4 and SORL1) and 8.1 % (BIN1) and that of APOE was 27.3 % [21]. The remaining genetic risk for LOAD could be due to new common loci, rare variants, structural variants, and gene-gene and gene-environment interactions. Most of large GWAS have identified several variants that affect LOAD susceptibility in non-Hispanic whites of European ancestry. African Americans and other minorities are understudied, and it is unclear whether any of the recently identified loci modify risk of LOAD in racial or ethnic groups other than whites. The ADGC consortium [22] performed a GWAS among the largest sample of African Americans ever assembled for genetic study of LOAD (nearly 2,000 cases and 4,000 cognitively normal elderly controls). The APOE ε4 allele, previously shown to be associated with LOAD in whites, was also implicated in African Americans (P = 5.5 × 10−47, OR = 2.3), and more striking was that the effect size for ABCA7 was comparable with that observed for APOE. In fact, variants at the ABCA7 gene increased the risk for LOAD approximately 1.8-fold (P = 2.21 × 10−9) in individuals of African ancestry as opposed to the modest increased risk of 1.15-fold in individuals of European ancestry (Table 4.1). A number of other variants in other genes (CR1, BIN1, PICALM, CLU, EPHA1, MS4A cluster, CD2AP, and CD33) did not reach the P value cutoff for genome-wide significance in this African American population.

Functional Basis for the LOAD-Associated GWAS Loci

True Functional Variants: Expression Quantitative Trait Loci

The associated SNPs identified through GWAS are unlikely to be functional variants themselves. For any disease-associated SNP, the true variant underlying the phenotype studied may be the GWAS hit itself, a known common SNP in LD with the identified GWAS hit, an unknown common or rare SNP tagged by a haplotype on which the hit occurs, or a linked copy number variant [23]. For all traits studied by GWAS, only 12 % of the associated SNPs are located in, or occur in high LD with, protein-coding regions of genes; the vast majority (80 %) of trait-associated SNPs are located in intergenic regions or noncoding introns [24]. LOAD is not different: taking into account the 19 SNPs reported in the 11 new loci and the 8 previously reported loci associated to LOAD in the IGAP mega-meta-analysis [21], 12 SNPs are located in intronic regions and 7 in intergenic regions (Table 4.1). These findings clearly indicate that follow-up studies should not only examine coding variability but should also play close attention to the potential roles of these intronic and intergenic regions in the regulation of gene expression. Therefore, GWAS follow-up studies should rely on fine mapping of the associated loci and deep re-sequencing of the associated regions in samples of interest in order to identify all possible functional variants. In addition, it is critical to characterize the novel LOAD candidate variants and genes that are being identified in LOAD-associated GWAS with respect to their influence on gene expression, also known as expression quantitative trait loci (eQTL) studies [25]. The underlying premise of these studies is that the level of the expressed gene transcript (mRNA profiling) from LOAD patients will have changes in comparison to controls, by using data generated from tissue affected by the disease (such as the temporal cortex) or peripheral immune cells [26]. SNPs that influence brain gene expression (eSNPs) constitute an important class of functional variants. In this respect, SNPs in the CLU (rs11136000) and MS4A4A (rs2304933/rs2304935) genes influenced their expression levels in temporal cortex [27]: the LOAD-protective CLU and the risky MS4A4A alleles both occurred in conjunction with elevated levels of brain expression, implicating regulatory genetic variation for these genes in LOAD risk. In a systematical examination of CLU, CR1, ABCA7, BIN1, PICALM, and MS4A6A/MS4A6E loci for LOAD, coding variability might explain only the ABCA7 association with LOAD, but common coding variability did not explain any of the other loci; in addition, none of these loci had eQTL effects and the regional expression of each of the loci did not match the pattern of brain regional distribution in Alzheimer pathology [23].

Pathogenic Pathways Implicated in LOAD from GWAS Loci

The LOAD candidate genes make biological sense and have identified different pathways involved in LOAD pathogenesis [4, 21, 28]. As suggested by Table 4.1, the implicated pathways are:

  • A/Amyloid-beta metabolism (production, degradation, and clearance): APOE, CLU, ABCA7, PICALM, BIN1, CD2AP, SORL1, CASS4, and CD33 [29]

  • B/Immune system function (both innate and adaptive): CLU, CR1, ABCA7, MS4A cluster, CD33, EPHA1, HLA-DRB5/DRB1, INPP5D, and MEF2C

  • C/Cholesterol metabolism: APOE, CLU, and ABCA7

  • D/Synaptic cell functioning mechanisms and cell membrane processes (endocytosis): PICALM, BIN1, CD33, CD2AP, EPHA1, SORL1, CELF1, NME8, MEF2C, and PTK2B

  • E/Tau pathology (microtubule stability, tau phosphorylation/aggregation, and neurofibrillary tangle formation): CASS4, FERMT2, SLC24A4/RIN3, BIN1 [30], and PICALM [31]

Exactly how APOE might cause LOAD is a matter of debate, and as well as being the main transporter of cholesterol and other lipids into the brain, it is also thought to remove amyloid-beta from the brain. Ultimately, the validation of the pathogenic mechanisms of all these LOAD GWAS loci will require comprehensive functional studies in in vitro systems, in vivo animal models, and clinical samples.

Examining Genetic Influences on Endophenotypes

Endophenotypes are biologically relevant, quantitative, and heritable phenotypes. There are many endophenotypes that are currently utilized or are excellent candidates for genetic studies of LOAD, including cerebrospinal fluid measures of amyloid-beta, tau and phosphorylated tau, neuroimaging measures in magnetic resonance imaging (MRI) and positron emission tomography (PET) scans (such as hippocampal volume), and cognitive measures [25]. Genetic studies of LOAD endophenotypes are an effective approach for identifying disease risk loci that are complementary to case–control association studies, and these genetic variants might be implicated not only with risk for LOAD but also with age at onset or rate of progression. Cognitive endophenotypes (e.g., level of cognitive function and rate of decline in cognition) can help to detect genetic risk factors attributable to the preclinical and subclinical change in cognition in LOAD. For example, the simultaneous consideration of the joint effects of eight non-APOE LOAD-associated GWAS loci (ABCA7, BIN1, CD2AP, CLU, CR1, MS4A4E, MS4A6A, and PICALM) aggregated as a cumulative genetic risk score predicts accelerated progression from mild cognitive impairment (MCI) to LOAD in those subjects with higher scores [32]. Moreover, MCI patients with the APOE ε4 allele are more likely to convert to LOAD as compared to those without the APOE ε4 allele [33]. No clear profile has emerged from studies of the relation between genotype and amyloid or tau phenotype in cerebrospinal fluid: whereas no evidence for association between variants in BIN1, CLU, CR1, and PICALM genes and amyloid-beta and phosphorylated tau levels in cerebrospinal fluid has been found in a study [34], APOE ε4 allele, CLU, and MS4A4A genetic variants were associated with significantly reduced amyloid-beta levels in cerebrospinal fluid in other study [35]. Investigating whether LOAD-associated GWAS loci influence MRI measures (hippocampal and amygdala volumes and entorhinal cortex and temporal pole cortex thicknesses), the APOE ε4 allele and PICALM and CR1 genotypes have been significantly associated with these neuroimaging measures [36].

The Whole Exome and Whole Genome Sequencing Approach

Common Versus Rare Variants

A proportion of heritability (the portion of phenotypic variance in a population attributable to additive genetic factors) is apparently unexplained by GWAS findings. Explanations for this “missing heritability” include rarer variants (possibly with larger effects) that are poorly detected by available genotyping arrays that focus on variants present in 5 % or more of the population; structured variants poorly captured by existing arrays, including copy number variants such as insertions and deletions and copy neutral variation such as inversions and translocations; low power to detect gene-gene interactions; and inadequate accounting for shared environment among relatives [37]. It is likely that a substantial portion of the genetic risk underlying LOAD is actually conferred by rare sequence variants, those occurring with a frequency <1 % in the general population, and possibly of relatively large genetic effect (e.g., with odds ratios >2). Rare variants are much more likely to have functional consequence than the more common variants; in fact, regulatory regions show preferential exclusion of common variants relative to rare ones just like protein-coding sequence [38]. GWAS are by design powered to detect association with causal variants that are relatively common in the population, and current microarray technology is not designed for de novo identification of rare sequence variants. Thus, the identification of presumed disease-associated rare variants requires deep re-sequencing in suitable data sets, either small scale (e.g., previously associated GWAS regions) or large-scale (whole exome or whole genome). Whole exome sequencing is most often chosen for monogenic Mendelian diseases, largely because of its low cost compared with whole genome sequencing (the exome is 1–2 % of the whole genome) and the notion that most sequence variations leading to a severe phenotypic effect are located in the coding part of the genome [4]. Whole exome sequencing is capable of identifying not only very rare Mendelian causes of disease but also low-frequency variability with medium-effect sizes modulating disease development. A significantly proportion of EOAD is caused by autosomal dominant, fully penetrant mutations. LOAD recurs within families more often than expected by chance alone, and this observed familial recurrence could be attributed to genetic loci with large phenotypic effects and reduced penetrance (possibly recessive loci) [10]. With monogenic recessive contributions to LOAD, one would not necessarily expect to see recurrence of the disease in multiple generations, nor a high recurrence rate among siblings, and the disease would be sporadic in the population. So far, the role of recessive mutations in LOAD has been considerably overlooked.

Rare Monogenic Forms of LOAD

TREM2

Homozygous loss-of-function mutations in TREM2 gene, encoding the triggering receptor expressed on myeloid cells 2 protein, have previously been associated with an autosomal recessive form of early-onset dementia presenting with bone cysts and consequent fractures called polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy or Nasu-Hakola disease. Homozygous TREM2 mutations have been recently identified in three Turkish patients presenting with a clinical phenotype associated with frontotemporal dementia and with leukodystrophy but without any bone-associated symptoms [39]. Whereas severe and early-onset disease is caused by homozygous loss-of-function mutations in TREM2, heterozygous loss-of-function variants are associated with LOAD. For example, in 2,000 LOAD patients and 4,000 controls, a rare missense mutation (rs75932628-T) in TREM2, which was predicted to result in a R47H substitution, showed a strong, highly significant association with LOAD (P = 9.0 × 10−9, OR = 5.05), with a minor allele frequency among healthy controls of 0.12 % in the United States [40]. Similarly, in 2,261 Icelandic participants, the T allele of rs75932628 in TREM2 was found to confer a significant risk of LOAD (P = 3.42 × 10−10, OR = 2.9), with a minor allele frequency of 0.63 % in healthy controls [41]. Consequently, this R47H variant of TREM2 is a low-prevalence variant that increases LOAD risk with a moderate-to-high effect size, similar to that of the APOE ε4 allele. Neurodegeneration in TREM2-associated LOAD is probably driven by a chronic inflammatory process with dysfunction in the microglial phagocytosis [42].

APP

About 25 coding mutations in the APP gene have resulted in EOAD, but until now, mutations in APP had not been implicated in LOAD. In a set of whole genome sequence data from 1,795 Icelanders, the A allele of rs63750847 results in an alanine to threonine substitution at position 673 in APP (A673T) and was found to be significantly more common in the elderly control group aged 85 or greater than in the LOAD group (0.62 % versus 0.13 %, P = 4.78 × 10−7, OR = 0.189) [43]. In addition, the cognitive function of elderly noncarriers remained poorer than for carriers of A673T after removing LOAD cases. A673T represents the first example of a rare sequence variant conferring strong protection against LOAD and also protecting against cognitive decline in the elderly without LOAD. The A673T substitution is critical for reducing the production of amyloid-beta. The complete absence of the A673T variant in a large cohort of Asian subjects [44] suggests that this is possibly an ethnicity-specific variant.

MAPT

In a combined analysis of 15,369 subjects, re-sequencing at the gene encoding for the microtubule associated protein tau (MAPT) discovered that the rare substitution A152T within exon 7 of MAPT increases the risk for LOAD (0.69 % in patients versus 0.30 % in controls, P = 4.0 × 10−3, OR = 2.3) and also for frontotemporal dementia (0.89 % in patients, P = 5.0 × 10−4, OR = 3.0) [45]. This study emphasizes the point that statistical evaluation of the role of rare sequence variants poses a challenge, and no thresholds for rare variant significance have been established. The functional studies show that the A152T in MAPT causes a pronounced decrease in microtubule stability and might enhance the level of tau oligomers. This is another example that rare variants can increase the risk for complex diseases with heterogeneous phenotypes.

FRMD4A

In a meta-analysis of EADI and GERARD consortia and a combined analysis of five additional case–control studies (10,000 LOAD and 16,000 controls), the AAC haplotype in the FRMD4A locus was associated with increased LOAD risk (P = 1.1 × 10−10, OR = 1.68) when compared with most frequent GGT haplotype [46]. As the AAC haplotype is rare (with a mean frequency of 2 % in Caucasian populations), this might explain why the locus was not detected in previous GWAS based on single-SNPs analyses, as SNPs with low frequency are poorly imputed even when using the 1000 Genomes data set. Therefore, other complementary approach to GWAS is this example of genome-wide haplotype association study. The protein encoded by FRMD4A is involved in cell structure, transport, and signaling functions.

Gene-Gene Interactions (Epistasis)

Evidence is accumulating that a pronounced part of the still elusive genetic variability in complex diseases could be due to ignored epistatic effects [47]. The term epistasis is conventionally used when an increased risk is only seen in the presence of two genetic factors and not seen when they act apart. In such cases, studies that examine simple loci individually, such as most GWAS, will fail to detect an effect. To understand the causes of LOAD, one needs to study not simple factors one at a time but interactions between genetic risk factors. In the case of LOAD, epistasis is likely to play a major part, in view of the high heritability of the disease. Epistasis had previously proved hard to demonstrate, mainly because sample sets had been too small and poorly characterized and inappropriate statistical methods had been used. The Epistasis Project [48] was designed to avoid these problems, with a multinational collaboration of 7 LOAD research groups from the UK, Spain, the Netherlands, and Germany, contributing DNA samples from 1,757 LOAD cases and 6,295 controls. A typical GWAS may examine perhaps 500,000 loci, but the number of potential two-way interactions between these 500,000 loci is >100 billion (1011). In order therefore to reduce the number of potential interactions to a manageable figure, a hypothesis-driven approach is required, and consequently, a selection of gene-gene interactions should be chosen according to prior evidence of a statistical interaction and a plausible biological hypothesis [49]. The chosen interactions in the Epistasis Project were involved in various pathogenic networks that contribute to the development of LOAD (lipid metabolism, amyloid-beta metabolism, inflammation, oxidative stress, and insulin metabolism), and the “synergy factor” [50] (equivalent to the interaction term defined by two binary factors in a logistic regression model) was used to measure the gene-gene interaction. In the inflammation pathway, the Epistasis Project has demonstrated that the interaction between the interleukin-6 proinflammatory cytokine and the interleukin-10 anti-inflammatory cytokine genes [48] and the interaction between the aromatase (a rate-limiting enzyme in the synthesis of estrogens) and the interleukin-10 genes [51] are both associated with increased LOAD risk. In the oxidative stress pathway, the Epistasis Project has revealed an increased LOAD risk due to the interaction between the hemochromatosis and transferrin genes [52] and the interaction between the glutathione S-transferase and the gene cluster of the hematopoietically expressed homeobox, the insulin-degrading enzyme, and the kinesin family member 11 [53]. In the future, to achieve higher power for such gene-gene interaction studies, larger sample sizes are needed, such as that of the IGAP mega-meta-analysis of GWAS [21].