Introduction

The common late-onset form of Alzheimer disease (AD) has a strong genetic component,1 a portion of which is explained by APOE and several other genes identified by positional mapping, targeted gene analysis and genome-wide association studies (GWAS).2, 3, 4 Together, these loci account for less than one-half of the heritable component in AD susceptibility, of which 20–25% is due to APOE.4, 5 Because many of the known AD loci cluster in biological pathways, including those involved in inflammation, lipid metabolism and processing and intracellular trafficking of Aβ, there are likely more AD risk loci that are difficult to detect because of very weak effect size, allelic heterogeneity or rare variants. To examine yet another hypothesis, namely, that associations for some loci may be obscured by confounding or interaction with other loci, we conducted a two-stage GWAS in APOE genotype subgroups using the large resources of the International Genomics of Alzheimer’s Project (IGAP).

Methods

Study population

Details of the stage 1 sample from the IGAP Consortium, including subject recruitment, genotyping, imputation, quality control, population substructure and statistical methods for association analyses, were previously described.4 In brief, phenotype and genotype data, including APOE genotypes, for a total of 53 711 subjects were assembled by IGAP from the Alzheimer’s Disease Genetic Consortium (ADGC), the Cohorts for Heart and Ageing Research in Genomic Epidemiology (CHARGE) consortium, the European Alzheimer's Disease Initiative (EADI) and the Genetic and Environmental Risk in Alzheimer’s Disease (GERAD) consortium. Characteristics of this sample are in Supplementary Table S1. The stage 2 data set included GWAS and APOE genotype data for 4203 subjects of European ancestry from the ADC4, ADC5, ADC6, MTV, Pfizer and TARCC data sets in the ADGC. These individuals were recruited under protocols approved by the appropriate Institutional Review Boards. Details of the individual data sets are provided in the Supplementary Materials and summarized in Supplementary Table S1.

Procedures

QC, imputation and population substructure in stage 2 data sets. Quality control of the clinical and genotype data in these cohorts was performed using the procedures described elsewhere.4 Single-nucleotide polymorphism (SNP) genotypes in each stage 2 data set were imputed with IMPUTE2 using reference haplotypes from the March 2012 release of 1000 Genomes. We compared imputation results for selected variants in the stage 1 data sets using the March 2012 release of 1000 Genomes and prior imputation on the December 2010 release and found no significant difference in the distribution of genotype probabilities between old and new imputations for the same samples among the original ADGC data sets. We used actual APOE genotypes when available, because previously we observed that imputation in this region using the 1000 Genomes reference panel is unreliable.5 Population substructure was evaluated within each data set by principal components analysis using EIGENSTRAT (http://www.hsph.harvard.edu/alkes-price/software/) and a subset of 21 109 SNPs common to all genotyping platforms.

Statistical analysis

Genome-wide association study. Within each stage 1 data set, genome-wide association analyses were conducted separately in subgroups of subjects with and without the APOE ɛ4 allele using a logistic generalized linear model in case–control data sets and a logistic generalized estimating equation in family-based data sets. The potential independent effect of the APOE ɛ2 allele was not examined because of the paucity of carriers of this allele, thus rendering very small cell sizes particularly among AD cases and in smaller data sets. Cox-proportional hazards models were used to evaluate association with incident AD in three CHARGE cohorts. A quantitative estimate between 0 and 2 for the dose of the reference allele for a SNP was used to incorporate the uncertainty of the imputation estimates. Interaction between a SNP and APOE genotype was evaluated in the APOE genotype subgroups combined within each data set using regression models, including age, sex, the first three principal components and terms for the SNP, APOE ɛ4 status and interaction between the SNP and APOE ɛ4 status. Results for each model across data sets were combined by meta-analysis using the inverse variance method implemented in the software package METAL (http://www.sph.umich.edu/csg/abecasis/Metal/). Effect sizes were weighted by their inverse variance and a combined estimate was calculated by summing the weighted estimates and dividing by the summed weights. SNPs with a minor allele frequency >5% that were available in at least 50% of the data sets were included in the meta-analysis. The meta-analysis P-value for association was estimated by the summarized test statistic, after applying genomic control within each individual study.

Follow-up analysis in stage 2 data sets. SNPs attaining a P-value <10−4 in the stage 1 GWAS were evaluated in each of the stage 2 GWAS data sets, containing a total of 1786 APOE ɛ4+ and 2417 APOE ɛ4− subjects (Supplementary Table S1), using the same approach described above.

Gene expression analysis

The effect of top-ranked SNPs on gene expression was evaluated using an open access database of control brain microarray data (BRAINEAC) made publically available by the UK Human Brain Expression Consortium (http://caprica.genetics.kcl.ac.uk/BRAINEAC). This data set contains information generated by analysis of tissue samples obtained from 12 different central nervous system regions in 134 individuals. Details of the expression quantitative trait locus (eQTL) analysis are reported elsewhere.6 In this study, the experiment-wise significance threshold for association of a genetic marker with expression was determined to be 1.6 × 10−7 at the gene level and 1.8 × 10−6 for individual exons. Potential for functionality of the top-ranked SNPs was assessed using the Regulome database (http://www.regulomedb.org).

Results

We conducted a genome-wide association study for AD using data sets stratified by APOE genotype assembled by IGAP, which were from the ADGC, CHARGE consortium, EADI and GERAD consortium. Meta-analyses were performed separately in APOE ɛ4+ (10 246 cases and 11 924 controls) and APOE ɛ4− (7231 cases and 19 603 controls) subgroups, as well as the total sample, using a model including a term for the interaction of the SNP with the APOE ɛ4 status. There was limited genomic inflation in the GWAS results in the APOE ɛ4+ (λ=1.05) and APOE ɛ4− (λ=1.06) groups, but not in the total sample (λ=0.98) testing the ɛ4 × SNP interaction (Supplementary Figure S1). Genome-wide significant (GWS) association (P<5 × 10−8) for AD was found in five distinct regions (CR1, BIN1, CLU, PICALM and APOE) in the APOE ɛ4+ subgroup (Supplementary Figure S2A, Supplementary Table S2) and four distinct regions (BIN1, HBEGF, MS4A6A/MS4A4A, SLC24A4 and APOE) in the APOE ɛ4− subgroup (Supplementary Figure S2B, Supplementary Table S2). No significant SNP × APOE interactions were found in the total group (Supplementary Figure S2C). Suggestive association (P<10−6) was observed with SNPs in five novel loci in the APOE ɛ4− subgroup (SOX14/CLDN18, ACSL6, FAM20C, MAPT region and CDR2L; Supplementary Figure S2B, Supplementary Table S3) and with 21 TMEM106B SNPs (top result: rs1595014, P=1.6 × 10-7) (Supplementary Figure S2C, Supplementary Table S3).

Approximately 1130 SNPs from 38 regions (including 7 previously established AD loci) were tested in stage 2 (Supplementary Table S3). Follow-up analyses of the novel loci confirmed association with SNPs in CDC42SE2-ACSL6, KANSL1/LRRC37A and CDR2L in the stage 2 sample (Table 1, Supplementary Table S2), but only SNPs near MAPT and between KANSL1 and LRRC37A (Figure 1a) were GWS after combining results from the stages 1 and 2 samples (best SNP: rs2732703, meta-analysis: P=5.8 × 10−9). The association was consistent in nearly all data sets which contained rs2732703 information (Figure 1b). To verify the reliability of the association with rs2732703, an imputed SNP, we compared rs2732703 allele dosages obtained directly by genotyping using a Taqman assay with those derived from imputation among 1010 subjects from the ACT, ADC4, ADC5 and ADC6 data sets. The correlation of these values, 0.813 in the entire sample and 0.834 among APOE ɛ4− subjects, as well as a genotype misclassification rate of only 3.5% among subjects with imputed probability scores>0.8 for a particular genotype, suggest that our association findings were not influenced substantially by imputation quality.

Table 1 Association results (P<10-6) in novel AD loci among APOE ɛ4− subjects in the combined stages 1 and 2 samples
Figure 1
figure 1

Association of Alzheimer’s disease with single-nucleotide polymorphisms (SNPs) in chromosome 17q21.31 in the combined stages 1 and 2 samples. (a) Regional Manhattan plot in the APOE ɛ4+ (upper panel) and the APOE ɛ4− (lower panel) subgroups. SNPs with the lowest P-value are indicated with a purple diamond. Computed estimates of linkage disequilibrium (r2) of SNPs in this region with the most significant SNP are shown as red circles for r2⩾0.8, orange circles for 0.6⩽r2<0.8, green circles for 0.4⩽r2<0.6, light blue circles for 0.2⩽r2<0.4, and blue circles for r2<0.2. Unannotated SNPs are shown as grey circles. (b) Forest plot of association results for rs2732703 in the stages 1 and 2 and total samples among APOE ɛ4− subjects. CI, confidence interval; MAF, minor allele frequency; OR, odds ratio.

PowerPoint slide

Further examination of this region in the total sample revealed an association peak spanning >1.25 Mb that contains 15 genes (Figure 1a). Within this region, 17 SNPs were GWS, have minor allele frequencies ranging from 0.13 to 0.17 and are located in a 10.2-kb segment upstream of both KANSL1 and LRRC37A (Supplementary Table S4). Nominally significant association was observed with only one of these SNPs among ɛ4+ subjects (rs2732703, P=0.02) (Supplementary Table S3). Although the odds ratios (OR) for effect of the effect of minor allele on AD risk were substantially lower for all of the GWS SNPs in the ɛ4− group (0.54<OR<0.86) than in the ɛ4+ group (0.76<β<1.04), there was no evidence of interaction with APOE genotype (Supplementary Table S3). The minor alleles of these SNPs reduced AD risk by 20–37% in the ɛ4− group. The 350-kb gap in the broad association signal is punctuated at one end by a ‘cliff’ adjacent to the MAPTKANSL1LRRC37A association peak (Figure 1). This gap is populated by relatively few SNPs and contains several copy-number variation polymorphisms.7, 8 To explore the possibility that the association observed in the present analysis is explained by previously identified haplotypes H1/H2 in the MAPT region,8 we evaluated six models in the entire data set conditioning on rs8070723 (an H1/H2 tagging SNP), rs2732703 or rs199533. Rs2732703 remained significant in models conditioning on rs8070723 (P=0.013) or rs199533 (P=0.0020), and rs8070723 was marginally significant in the model conditioning on rs199533 (P=0.043) (Supplementary Table S5, Supplementary Figure S3). These results suggest that KANSL1/LRRC37A is the only AD risk locus in this region.

We also examined the effect of APOE ɛ4 status on previously established AD loci (Supplementary Table S2). Four of these loci attained genome-wide significance in at least one of the APOE subgroups (Table 2), and the association signal in the MS4A cluster region was evident primarily in the APOE ɛ4− subgroup (Supplementary Figure S4). The association of AD with CR1, BIN1 and CLU was supported in both the APOE subgroups.

Table 2 Results (P<10-6) in previously known AD loci showing different pattern of association among APOE ɛ4+ and ɛ4− subjects in the combined data sets

Next we interrogated the BRAINEAC database to determine whether any of the 17 GWS SNPs located between KANSL1 and LRRC37A are cis-eQTLs. Data were available for only one of these SNPs (rs113986870) that is in high linkage disequilibrium (LD) with and 2461 base pairs away from rs2732703 (r2 and D’>0.9). Ten exon probes from four genes (KANSL1, LRRC37A4P, MAPT and C17orf69) were significantly associated with rs113986870 when averaged across all brain regions (Table 3). Rs113986870 was significantly associated with gene-level expression (Figure 2a), as well as with exon-level expression, (Figure 2b) in hippocampus, temporal cortex and cerebellum. In these brain regions, rs113986870 was significantly associated with KANSL1 probes 3762011, 3762012 and 3762013 that measure expression of the first translated exon. Additionally, we observed that expression of probe 3760518 (Supplementary Figure S5A) present in all three transcripts (NM_001193466, NM_015443 and NM_001193465) and 3760219 in transcript variant 2 (NM_015443) was significantly associated with rs113986870 (Supplementary Figure S5B), while expression of probe 3760217 in transcript variant 1 (NM_001193466) was not significant (Supplementary Figure S5C), indicating that alternative splicing may be a crucial mechanism for regulating KANSL1 expression. Rs113986870 was also strongly associated with MAPT transcription (Supplementary Figure S6A) and in particular with probe 3723712 that targets transcription of alternatively spliced exon 3 in frontal cortex (P⩽9.2 × 10−6) and temporal cortex (P⩽2.6 × 10−6) (Supplementary Figure S6B). The rs113986870 minor allele (A), which is associated with reduced risk of AD (Supplementary Table S4), increased the expression of the target exons in KANSL1 and MAPT (Figure 2, Supplementary Figure S6, Supplementary Figure S7). The association with LRRC37A4P exon probe 3759898 was significant in all three AD-related brain regions (P⩽3.6 × 10−9). The association of rs113986870 with exon probe 3723594 for C17orf69 was significant in hippocampus only (P=1.6 × 10−7). Five of the GWS SNPs, including rs2732703 and rs113986870, are located within a transcription factor-binding site or a DNase sensitivity peak, and two of these five SNPs, including rs2668626 which is only 47 bp from rs2732703, have also been identified within an eQTL (Supplementary Table S4).

Table 3 Exon probes covering the region between 43.5 and 45.0 Mb on chromosome 17 that reveal significant rs113986870 allelic expression differences averaged over 10 brain areas
Figure 2
figure 2

Genotype specific effect of the expression quantitative trait locus (eQTL) rs113986870 on expression of KANSL1. (a) Gene-level expression of KANSL1 transcript t3760137. Transcript-level expression represents the average across all KANSL1 exon probe sets. (b) Expression of exon probe 3760212. Probes 3760211, 3760212 and 3760213 measure expression of the first translated exon, are present in all three transcript variants and were significantly associated with the eQTL. Expression profiles for probes 3760211 and 3760213 were similar to those for probe 3760212 (Table 3). The distance from 3760212 to rs113986870 is 85 431 base pairs. Log2 scale of expression (y axis) is shown for 10 regions of cognitively normal human brains (x axis) ordered by mean expression level. Rs113986870 genotype counts: AA=0, AG=56, and GG=76. Rs113986870 allele frequencies are 0.21 (A) and 0.79 (G). CRBL, cerebellum; FCTX, frontal cortex; HIPP, hippocampus; MEDU, medulla (specifically inferior olivary nucleus); OCTX, occipital cortex (specifically primary visual cortex); PUTM, putamen, SNIG, substantia nigra; TCTX, temporal cortex; THAL, thalamus; WHMT, intralobular white matter.

PowerPoint slide

Discussion

This study was undertaken to identify loci whose effect on AD risk may be obscured by confounding or interaction with APOE genotype. Our APOE-stratified GWAS is the first to show GWS association for AD with SNPs in the chromosome 17q21.31 region, including MAPT, KANSL1 and LRRC37A. Among the genes expected to emerge from GWAS but never seen before is MAPT, which encodes the microtubule-associated protein tau (MAPT) found in AD neurofibrillary tangles. The association peak is located between KANSL1 and LRRC37A, approximately 200 kb downstream of MAPT, in a subset of subjects who do not possess the APOE ɛ4 allele. Although the association signal includes MAPT, conditional analysis suggests that the causal variant(s) are more likely located in a DNA segment between the 5′ end of KANSL1 and 5′ end of LRRC37A and not within MAPT or another gene distal to LRRC37A.

The nature of the AD-related functional variant could not be discerned from our genetic association findings. None of the GWS SNPs are within 42.1 kb of the KANSL1 start site or 16.8 kb of the LRRC37A start site, suggesting that the functional variant is not within the promoter region of either gene. KANSL1 is a widely expressed gene encoding a member of the nonspecific lethal complex. The KANSL1 protein is an evolutionarily conserved regulator of the chromatin modifier KAT8, which influences gene expression through histone H4 lysine 16 acetylation.9 Notably, mutations in KANSL1 cause the 17q21.31 microdeletion syndrome, which is associated with a wide range of abnormalities, including intellectual disability and developmental delay, and is therefore thought to be involved in neuronal development.10, 11 LRRC37A encodes a member of the leucine-rich repeat containing 37 family. Leucine-rich repeats (LRRs) are protein–ligand interaction motifs found in a large number of proteins with different structure, localization and function.12 LRR motifs are important for intermolecular or intercellular interactions with exogenous factors in the immune system and/or with different cell types in the developing nervous system.12

However, expression analysis of exon array data in control brain tissue revealed that rs113986870, which is in high LD with the top-ranked SNP (rs2732703) in the GWAS, is an eQTL for expression of the first translated exon in KANSL1 and the alternatively spliced exon 3 in MAPT. Previous studies suggest that splicing of MAPT may be a crucial regulatory mechanism in the brain and tauopathies in particular13 and that increased expression of exon 3 protects against neurodegeneration.14 Although rs113986870 is apparently not an eQTL for its adjacent gene LRRC37A, it was significantly associated with a closely related gene, LRRC37A4P, in all three AD-related brain regions. These results suggest that rs113986870 may have a potential function as a cis-acting regulatory element for multiple genes in this region. Another confounding feature of this region are copy-number variations that in part overlap with the 5′ end of KANSL1 and possibly influence expression.7, 8 Thus it is possible that the exon probes targeting the first translated in KANSL1 may be tagging this duplication. In addition, interrogation of a database curating information about DNA features and regulatory regions revealed that five of the GWS SNPs, including rs2732703 and rs113986870, may have strong regulatory potential.

The association peak for AD on chromosome 17q21.31 is located in a well-recognized and perplexing genomic region containing a 900-kb inversion.8 Previous GWAS identified associations of variants within and at the edges of this inversion with Parkinson disease15 and progressive supranuclear palsy,16 but the most significant associations were not with SNPs between KANSL1 and LRRC37A (Supplementary Table S6). Multiple studies have identified >40 MAPT deletions, missense mutations and splice site mutations that cause frontotemporal dementia (FTD).17 Although AD is only nominally associated with common variants in MAPT, previously we observed association of a rare MAPT variant (A152T) with increased risk for FTD and AD in a large sample,18 a finding which was supported by a subsequent smaller study.19 Ikram et al.20 identified a GWS association peak with a KANSL1 SNP approximately 166 kb away from our most significant AD SNP (rs2732703) for a continuous measure of intracranial volume in a sample of nearly 10 000 community-dwelling elders (Supplementary Table S6). These two SNPs are moderately correlated (r2=0.71), which indicates that they may tag the same functional variant.

Other studies have focused on two divergent extended MAPT haplotypes, H1 and H2, which are in near complete LD with status of the inversion and contain independently derived partial duplications of KANSL1.8, 16 The common H1 haplotype is associated with increased risk of FTD,21 Parkinson disease,22 progressive supranuclear palsy23 and corticobasal degeneration23 while H2 is linked to recurrent deletion events associated with the 17q21.31 microdeletion syndrome.10 Among these non-AD forms of dementia, it is possible for FTD to masquerade clinically as AD and thereby cases of FTD could be present in our study group; however, any inadvertent inclusion of FTD cases is expected to be very small as the minimum age of dementia onset in our study group was 60 years and onset of dementia from FTD after age 69 years is relatively rare compared with AD that in most cases occurs after age 69 years.24 Furthermore, a recent review of almost 5000 autopsy brains from a subset of cases in the ADGC cohort failed to identify any case of FTD.25 Myers et al.26 reported association of AD with H1 and with common MAPT SNPs, but this association is controversial27 and did not reach genome-wide significance in our study or previous GWAS. Another recent study showed that carriers of at least one H2 allele had a 5.4-fold increased risk of worsening hallucinations, but this result was marginally significant.28 Previously, we observed in a subset of the sample studied here that the H2-haplotype tagging rs8070723-G allele was associated with reduced risk of AD.29 However, this variant is no longer associated after conditioning on rs2732703 (Supplementary Table S5). In carriers of H2, the ancestral haplotype in both humans and chimpanzees,30 increased expression of exon 3 in MAPT has been associated with an eQTL located approximately 1500 bp from rs113986870, which decreases aggregation of microtubules.6, 31 These observations are consistent with our results showing that the rs113986870 minor allele is protective for AD and associated with elevated exon3 expression.

There is a large body of experimental evidence linking tau protein to AD pathogenesis,32 and some studies show evidence of association of AD with common MAPT SNPs.29, 33 However, analysis of the MAPT coding sequence did not reveal disease-causing variants for early-onset AD,34 and other studies examining association of MAPT SNPs with late-onset AD were negative.27, 35 Recently, Allen et al.29 reported that the rs8070723-G allele was associated with reduced MAPT expression in the cerebellum and temporal cortex of AD subjects. Robust genetic associations have also been identified for AD with several genes in cytoskeletal and axonal transport pathways, including tau, or leading to neurofibrillary tangles, most notably BIN1, EPHA1, RIN3, CASS4 and FERMT2.4

Based on the observation that overexpression of human ApoE4 in transgenic mouse neurons results in hyperphosphorylation of tau,36 it is possible that associations with AD-related loci in the chromosome 17q21.31 region are obscured by the much stronger effect of APOE ɛ4 on MAPT expression or function.37 This idea is consistent with lack of GWS association with 17q21.31 SNPs in the same data set without stratification by APOE genotype,4 and no evidence for interaction between APOE and any SNPs in the MAPTKANSL1LRRC37A region in the current study. Another possible explanation for the significant association of 17q21.31 SNPs with AD only among subjects lacking APOE ɛ4 is genetic heterogeneity, suggesting that variation at the chromosome 17q21.31 locus is associated with a distinct etiological subtype of AD where tau is the primary disease activator.38 Finally, the diagnosis of AD for most subjects in this data set was established, clinically suggesting the possibility of misdiagnosis or AD accompanied by other processes associated with other dementing illnesses. Further studies are needed to determine whether this subtype can be distinguished clinically or neuropathologically.

Our study also showed that the previously established association with the MS4A gene cluster is derived almost completely from subjects lacking APOE ɛ4, suggesting the contribution of the MS4A locus to AD may be mechanistically different than AD-related processes that are associated with APOE ɛ4. Members of the MS4A gene family encode membrane proteins, some of which have known roles in immune cell function;39 however, little is known about the function of MS4A6A, MS4A4A or MS4A6E in humans. Karch et al.40 showed that expression of MS4A6A was upregulated in the brains of AD patients compared with the brains of controls and significantly correlated with AD status, AIF1 expression (a marker for microglia, which is the immune cell of the brain), cognitive dementia rating score and extent of AD neuropathological change.

The observed statistical interaction of genotypes for TMEM106B with APOE on AD risk in the stage 1 GWAS is noteworthy (rs1595014, P=1.6 × 10−7) even though it is not supported by results in the comparatively small stage 2 sample. TMEM106B is a glycoprotein predominantly localized at the lysosomal membrane where it might interact with intracellular progranulin (GRN).41, 42 TMEM106B variants, particularly the p. T185S (rs3173615) mutation, are risk factors for FTD, especially among persons carrying a GRN mutation.43 TMEM106B variants are also associated with development of cognitive impairment in amyotrophic lateral sclerosis44 and implicated in the pathological presentation of AD.45 Cruchaga et al.46 observed association of the TMEM106B SNP rs1990622 risk allele with younger onset of the frontotemporal lobar degeneration subtype with TAR DNA-binding protein inclusions (FTLD-TDP), a pattern reminiscent of the association of APOE ɛ4 with increased risk and younger onset of AD. The biological underpinning of the interaction of TMEM106B with APOE affecting AD risk is unclear.

Our top findings, including those that are GWS, should be confirmed in independent samples. Functional studies will be needed to understand the relationship between APOE and the causative variant(s) in 17q21.31 once they are identified, as well as with other loci showing much stronger association with AD in particular APOE genotype strata (for example, MS4A6A/MS4A4A/MS4A6E) or through interaction with APOE (for example, TMEM106B). Our study provides a firm genetic connection of AD to several other pathologically distinct disorders in which dementia is a cardinal or common characteristic.