Introduction

Alzheimer’s disease (AD) is an age-related neurodegenerative disease of complex etiology and is characterized by a progressive decline of memory and cognitive faculties. Although late-onset AD risk has a strong genetic component including the APOE locus [10] and several other loci identified through genome-wide association studies (GWAS) [32], disease risk is also influenced by lifestyle and environmental factors such as diet [61], sleep [35], education and literacy [29], history of head trauma [14], and level of physical activity [53]. Epigenetic marks such as DNA methylation (DNAm) represent molecular regulatory mechanisms through which environmental and lifestyle factors may modulate AD risk, including through interaction with underlying genetic risk. A better understanding of the epigenetic regulation of gene expression in late-onset Alzheimer’s disease (LOAD) could facilitate the discovery of more viable preventive and therapeutic strategies for this devastating disease.

While previous studies have identified global DNAm alterations in Alzheimer’s disease [6, 9, 44], recent landmark epigenome-wide association studies (EWAS) [3, 12, 40] have identified DNAm changes around specific genes. In a study using postmortem brain tissue from the dorsolateral prefrontal cortex (DLPFC), DNAm is robustly associated with AD neuropathology at 71 genome-wide sites, ultimately implicating seven genes within 50 kb with dysregulated expression: CDH23, DIP2A, RHBDF2, RPL13, SERPINF1, SERPINF2, and ANK1 [12]. A parallel study found that DNAm of a region near ANK1 strongly correlated with neuropathological measures in three cortical brain regions (entorhinal cortex, superior temporal gyrus, and prefrontal cortex), but not cerebellum, suggesting that some epigenetic perturbations in AD occur across multiple cortical regions [40]. Likewise, a 48 kb region within the HOXA gene cluster was differentially methylated in AD across multiple cortical regions [65]. Subsequent studies [59, 62, 64, 66, 69] have integrated DNAm and genetic evidence to highlight AD relevant genes. Thus far, however, these approaches have implicated only a few genes associated with DNAm variation in AD.

While these previous studies have successfully identified DNAm differences associated with AD, they have been limited in their ability to connect these epigenetic differences to corresponding gene expression changes. Previous analyses often used targeted qPCR for a small number of transcripts and did not comprehensively survey gene expression changes among all implicated DNAm loci. Moreover, the relationship between DNAm and gene expression has remained unclear from earlier studies because each data type originated from non-overlapping subjects. Integrating DNAm with an unbiased method for surveying the transcriptome—RNA sequencing (RNA-seq)—may therefore offer deeper insight into [20] epigenetically mediated transcription dysregulation associated with AD.

We, therefore, performed multi-stage analyses incorporating paired DNA methylation and gene expression data. Given the unique cellular composition and potentially distinct susceptibilities to AD neuropathology of different brain regions, we chose to study four regions from each brain donor: dorsolateral prefrontal cortex (DLPFC), entorhinal cortex (ERC), hippocampus (HIPPO) all three previously implicated in AD; and the cerebellar cortex (CRB). In our first stage, we compared the DNAm landscape of neurotypical controls to late-onset AD, with the Illumina’s Human Methylation 450k (HM450k) array. Then, for genes adjacent to DNAm loci identified from this analysis, we analyzed RNA-seq data for differential expression between AD cases and controls, as well as for correlation between DNAm and gene expression levels. This undertaking represents one of the most comprehensive integrations of epigenetic and transcriptomic data for late-onset AD in postmortem human brain tissue to date.

Results

Clinical characteristics of postmortem brain donors

The dataset used for our epigenome-wide scan consisted of 73 postmortem brain donors with DNA methylation (DNAm) data generated from four brain regions: entorhinal cortex (ERC), dorsolateral prefrontal cortex (DLPFC), hippocampus (HIPPO), and cerebellum (CRB) (Table S1). Our discovery cohort includes 49 neurotypical controls and 24 AD donors with a neuropathological diagnosis (see “Methods”) as estimated by standard Braak staging and CERAD scoring. Compared to controls, AD donors were older (p = 2.95 × 10−8, ttwo-tailed = 7.67) and had reduced overall brain mass (p = 9.33 × 10−6, ttwo-tailed = 5.24, Figure S1). APOE risk, defined here as the number of ε4 alleles, is a strong genetic factor underlying AD clinical risk [13, 38] and was indeed more common in our AD samples than our control samples (p = 2.99 × 10−5, Fisher’s Exact Test). Alzheimer’s disease was not significantly associated with differences in DNAm-estimated NeuN+ (neuronal) composition in any of the four brain regions in our sample (p > 0.05, Results S1, Figure S2a), in line with previous large epigenome-wide association studies [12]. These DNAm cell-type composition estimates are reflective of the majority of variance (principal component 1 ~ 67%) in the dataset (Pearson’s r = − 0.833, p < 2.2 × 10–16, Fig. S2b). Unlike DNAm-derived estimates, those based upon our transcriptome data showed reduced neuronal proportions in the ERC and HIPPO regions in AD samples compared to controls (Fig. S2c). Overall, there was moderate correlation between cell-type estimates derived from DNAm data with those derived from transcriptomic data (r = 0.283, p = 2.88 × 10−6, Fig. S2d). There was no significant association between epigenetic age acceleration and AD in any of the brain regions (p > 0.05 Results S2, Fig. S3a, b).

CpG-site DNAm differences between AD subjects and unaffected controls

To identify differentially methylated sites with “shared effects” across multiple brain regions [40, 65], we adopted a powerful cross-brain region strategy (Results S3, Fig. S4, Table S2–S3, see “Methods”) where we analyzed all of our samples in conjunction (N = 269 total DNAm samples; Table S1). With this cross-region analysis, we identified 858 DNAm sites differentially methylated between AD and controls (FDR < 5%, Table S4). Collectively, these sites were within 10 kb of 1156 Gencode-annotated genes (v25), had a small median absolute difference in DNAm of 3.98%, and were relatively more methylated in subjects with AD compared to controls (N = 491, 57.2%, p = 2.30 × 10−5, Fig. 1a). The most prominent association between AD and DNAm within 10 kb of a protein-coding gene was with cg23703062 near ANKRD30B, a gene not previously implicated in AD (Fig. 1b). Other highly significant methylation sites replicated previously reported associations with cg19803550 (within 10 kb of WDR81 and SERPINF2, p = 3.48 × 10−1, 4th most significant) and cg05066959 (ANK1, p = 9.52 × 10−11), among several others suggesting that DNAm alterations in AD are replicable [12, 40]. However, the majority of our findings were novel associations (830 DNAm sites not presented in either of the landmark studies) indicating that there are several more DNAm loci associated with AD beyond those previously identified (Fig. S5). When we hierarchically clustered adjusted beta values of these 858 significant DMPs for all of our samples (N = 269), we found that samples clustered primarily by cerebellar vs. non-cerebellar brain region, then secondarily by AD diagnosis (Fig. 1c). Post hoc region-specific statistics were highly correlated across all four brain regions at these sites, confirming our expectation that AD has similar associations with DNAm at many of these sites across the different brain regions (Fig. S6).

Fig. 1
figure 1

a Histogram of effect sizes for significant differentially methylated probes (DMPs). 858 DNAm sites are significantly differentially methylated with the cross-region model (at false discovery rate, FDR < 5%). Of these 858 sites, 367 are less methylated (hypomethylated) and 491 are more methylated (hypermethylated) in AD patients compared to the unaffected controls. The greater number of hypermethylated sites constitutes statistically significant enrichment (p = 2.30 × 10−5). bANKRD30B differentially methylated probe (DMP) hypermethylation. Black points represent unaffected control samples and red points represent samples with a diagnosis of symptomatic Alzheimer’s disease (AD). Under the cross-region model, a DNAm site near ANKRD30B (cg23703062) was significantly more methylated in AD samples (red) than unaffected controls (black). Plotted beta values were adjusted for age, sex, ancestry, and estimates of technical variation using a linear model with logit transformation. c Heatmap of cross-region differentially methylated probes (DMPs). We adjusted beta-values of 858 differentially methylated probes identified by the cross-region model by regressing out covariates (age, sex, ancestry and estimates technical variance), then Z scaling the columns (DNAm sites). For hierarchical clustering and visualization purposes, Z values less than the 1st percentile or greater than the 99th percentile were set equal to these percentile Z values (i.e., thresholded) to limit the effect of extreme values. Brain samples (rows) and DNAm sites (columns) were clustered using Euclidean distance (dendrogram not shown for columns). dANKRD30B differentially methylated region (DMR). A series of neighboring probes overlapping the transcription start site (TSS) of ANKRD30B are more methylated in Alzheimer’s disease (red) than in unaffected controls (black). ANKRD30B ankyrin repeat domain 30B

We assessed the robustness of these findings using a series of sensitivity analyses. We found little effect of DNAm-estimated neuronal composition on these results, as adjusting for the estimated proportion of neurons resulted in almost all DNAm sites (sensitivity model 1, 842/858, 98.1%) remaining FDR significant with highly correlated global t statistics between both models (r = 0.954, p < 2.2 × 10−16, Fig. S7a). However, these results were somewhat expected given the lack of association between estimated neuronal composition estimates and diagnosis (p > 0.068) in our dataset. Furthermore, differential DNAm does not appear to be driven by the greater burden of APOE risk in our AD samples: t statistics were highly correlated with a model adjusting for APOE ε4 dosage (sensitivity model 2, r = 0.910, p < 2.2 × 10−16, Fig. S7b). Moreover, a slightly greater proportion of previously reported DMPs were consistent with the cross-region model than with the region-specific models of AD (Table S5). These analyses suggest that cross-region DMP findings from our discovery model are not confounded by cell-type composition or APOE risk burden differences and previously reported DMPs are reasonably consistent.

Regional DNAm differences between AD subjects and unaffected controls

Next, we tested for regional differences in DNAm that could relate to additional regulatory mechanisms [24] via differentially methylated regions (DMRs) using a bump hunting strategy that jointly tests neighboring DNAm sites for differential methylation [26]. We found four DMRs between AD cases and controls under a cross-region model (at family-wise error rate, FWER < 0.05, Table S6). The most significant DMR was a hypermethylated region 1136 base pairs (bp) long that overlapped an exon 57 bp downstream from the DUSP22 transcription start site (TSS) and spans nine probes (FWER = 0, Fig. S8a). Post hoc region-specific analyses revealed DUSP22 hypermethylation in ERC (FWER = 0.053), HIPPO (FWER = 0.08), and DLPFC (FWER = 0.088) but not in CRB (FWER = 0.455). Another DMR overlaps the TSS of ANKRD30B (FWER = 0.027, 511 bp, nine contiguous probes, Fig. 1d). The third most significant region was hypomethylated and overlaps the promoter of JRK (384 bp from TSS, FWER = 0.029, 5 bp long, two contiguous probes, Fig. S8b) and the fourth DMR overlaps NAPRT (FWER = 0.047, 615 bp long, seven contiguous probes, Fig. S8c). These results suggest a more limited role of regional changes in DNAm associated with AD.

Brain region-dependent differential methylation

Given that some brain regions, such as cerebellar cortex, are putatively less susceptible to AD pathology than other brain regions, we next assessed whether Alzheimer’s disease is associated with DNAm in a brain region-dependent manner (i.e., an interaction model, brain region by diagnosis). We found 11,518 DNAm sites with region-dependent AD effects (at FDR < 5%, Table S7, Fig. S9). The most significant region-dependent effect is seen for a DNAm site near ANK1, which is more methylated in AD subjects for DLPFC, ERC, and HIPPO but less methylated for CRB (cg11823178, p = 3.41 × 10−21, Fig. 2a). Another example of a region-dependent effect can be seen for a CpG near CSNK1G2 (cg01335597, p = 3.04 × 10−15, Fig. 2b). Region-dependent sites were mostly distinct from the cross-region DMPs identified above, though a subset of 130 sites was significant in both models (Fig. 2c). These results demonstrate the power of combining data across multiple brain regions into joint statistical models and help further partition DNAm differences in AD by brain region.

Fig. 2
figure 2

a Example region-dependent DMP at cg11823178. Black points represent unaffected control samples and red points represent samples with a diagnosis of symptomatic Alzheimer’s disease (AD). The DNAm site cg11823178 is within 10 kb of protein-coding gene ANK1. The effect of Alzheimer’s disease (AD) upon DNAm at this site is brain region dependent, with cortical brain regions, DLPFC, hippocampus (HIPPO), and entorhinal cortex (ERC), having a large difference between AD cases and unaffected controls, whereas little difference is observed in cerebellum (CRB). ANK1 ankyrin 1. b Example region-dependent DMP at cg01335597. The DNAm site cg01335597 is within 10 kb of CSNK1G2, and is less methylated in CRB but more methylated in DLPFC, HIPPO, and ERC, in AD cases relative to unaffected controls. CSNK1G2 casein kinase 1 gamma 2. c Venn diagram of cross-region and region-dependent sites. The number of overlapping and distinct cross-region and region-dependent DNAm sites is shown (at FDR < 5%)

Normal aging vs. AD-associated DNAm changes

Older age is a strong risk factor for Alzheimer’s disease [36] and indeed our AD donors were older than our controls (p = 2.95 × 10−8). We, therefore, sought to disentangle the effects of normal aging on DNAm from the effects of AD on DNAm by assessing the association between DNAm and age in the subset of neurotypical controls (N = 187, age range 52–83 years) and comparing these age-related associations with AD associations. Cross-region AD t statistics were significantly associated with age-related t statistics, but the absolute correlation was weak (Pearson’s r = 0.089, Fig. S10a), suggesting that on the genome-wide scale, cross-region DNAm differences in AD are generally distinct from the DNAm signature of normal aging, suggesting some Alzheimer’s disease specific epigenetic mechanisms. Nevertheless, 18.1% of the significant AD-associated DNAm sites (154/858) exhibited similar age-related changes in unaffected controls compared to only 1.66% of DNAm sites not associated with AD (6967/419,994; at p < 1×10−3 and same directionality; Odds Ratio OR = 13.0, p < 2.2 × 10−16). DNAm changes in normal aging, therefore, further support the biological relevance of some of the most promising DNAm association, including CpG sites near the genes: WDR81 (cg19803550, paging = 1.31 × 10−13), ANK1 (cg05066959, paging = 4.85 × 10−13), MYO1C (cg14462670, paging = 4.39 × 10−4), and BIN1 (cg22883290, paging = 1.00 × 10−10). In contrast, p values from the region-dependent model were highly correlated on a global scale with p values from a region-dependent normal aging model (r = 0.752, p < 2.2 × 10−16, Fig. S10b). We, therefore, focused further analyses on cross-region differential methylation, which appeared to be less susceptible to confounding by age (see Results S4 for additional region-dependent analyses).

Biological relevance of genes near differential DNAm

We sought to better characterize the genomic and biological correlates of the DNAm changes identified under our cross-region model. Differentially methylated sites were enriched for CpG shelves and shores (Fig. S11), suggesting that these genomic features may be more dynamic in AD. Then, we tested whether genes near differentially methylated sites were enriched in known pathways and ontologies while accounting for the biased distribution of CpGs per gene. Overall, DMPs were preferentially in close proximity to genes involved in processes related to cell adhesion, immunity, and calcium binding (at FDR < 5%, Table S8a). Further analysis suggests that hypermethylated DNAm sites are driving these enrichments rather than hypomethylated sites, which did not appear to be enriched in any particular process (Tables S8b, c). Thus, the gain of DNA methylation at these cross-region sites implicates genes involved in cell adhesion, immunity, and calcium binding in AD.

AD genetic risk loci are enriched for differential methylation

Given that AD risk has a strong genetic component, we investigated whether differential methylation was located preferentially within AD risk loci from a large GWAS meta-analysis [32]. While only a small number of differentially methylated sites were present in AD risk loci (5/858, 0.58%), it represents statistically significant enrichment when compared to the background overlap with risk loci (562/419,432, 0.13%; OR = 4.37, p = 0.00655, Fisher’s exact test, Table S9). This enrichment was fully driven by hypermethylated sites (N = 5, OR = 7.74, p = 0.000588). When we relaxed our stringency for differential methylation (nominal p < 0.01), the enrichment in GWAS loci remained statistically significant (OR = 1.90, p = 0.00423). Of note, probes within AD risk loci were more often dropped during quality control than those outside the loci (192 probes dropped, OR = 2.21, p < 2.2 × 10−16; see Appendix for more details). Dropped probes were located mostly within the HLA-DRB5 locus (index SNP: rs9271192;131/192, 68.2%) that was previously reported [69] to be differentially methylated, so our analysis may underestimate the relative enrichment in GWAS loci. These results indicate that differentially methylated probes overlap putative AD genetic risk loci more than expected, which can hopefully be expanded upon and further refined through larger GWAS efforts in AD.

Replication with an independent cohort

To assess the replicability of these cross-region DNAm differences, we re-analyzed processed, public AD DNAm data from four brain regions, DLPFC, ERC, CRB, and superior temporal gyrus (STG), under a similar model (Ncontrol = 94 samples, NAD = 239 samples from Lunnon et al. [40]; see “Methods” for further details). We found 28.0% of cross-region differentially methylated sites were consistent in this independent dataset (with p < 0.05 and the same direction of effect; 240/858, OR = 3.84, p < 2.2 × 10−16, Table S10a). Notably, a site near ANKRD30B was more methylated in AD samples than controls (cg23703062, Preplication = 0.000355). These moderate replication rates are similar to those we observed previously (Table S5), suggesting reasonable concordance between AD DNAm differences from different brain cohorts.

Using gene expression to functionally validate differential methylation

We aimed to functionally validate differences in DNAm associated with AD and, therefore, tested for AD case–control gene expression differences. We generated transcriptome data for mostly the same samples used in the DNAm analysis, from the same brain regions: HIPPO, DLPFC, ERC and CRB (Ndonors: 50 controls, 26 AD cases; Nsamples: 196 controls, 92 AD cases). Then, we investigated expression of genes near differentially methylated sites (i.e., 645 DNAm sites corresponding to 772 genes within 10 kb; 218 DNAm sites were not within 10 kb of a Gencode v25 annotated gene and were not considered for this analysis). The majority of these genes were nominally differentially expressed between AD cases and controls in at least one brain region (p < 0.05, 52.7% of genes, 407/772, Table S11). The significance of only two genes, PLEC and CNPY2, survived stringent correction for multiple testing (threshold pbonferroni = 1.62 × 10−5, Ntests = 772 × 4 = 3088), but at a slightly less conservative threshold, six additional genes remained significant: ANKRD30B, ATP9B, SYTL2, PCDHGB1, ENSG00000242687.2, and ADAMTS2 (threshold pbonferroni = 6.48 × 10−5, Ntests = 772). Interestingly ANKRD30B, which was hypermethylated in AD at both spatial resolutions (DMP and DMR), was under-expressed in AD patients compared to controls in ERC (log2 fold change, LFC = − 1.50, p = 3.71 × 10−5) and HIPPO (LFC = − 1.70, p = 0.00242) but not DLPFC (LFC = − 0.434, p = 0.198) or CRB (LFC = − 0.304, p = 0.665; Fig. 3a). Other genes such as WDR81 and MYO1C that were previously [12] implicated via differential DNAm were differentially expressed here (Fig. 3b, c).

Fig. 3
figure 3

aANKRD30B differential expression. ANKRD30B is significantly less expressed in AD samples compared to controls in hippocampus and entorhinal cortex brain regions (ERC: p = 3.71 × 10−5, LFC = − 1.50; HIPPO: p = 0.00242, LFC = − 1.70), but not DLPFC or cerebellum (DLPFC: p = 0.198; LFC = − 0.434; CRB: p = 0.665, LFC = − 0.304). ANKRD30B ankyrin repeat domain 30B. bWDR81 differential expression. WDR81 is significantly more expressed in AD cases than in unaffected controls within ERC (CRB: p = 0.915, LFC = 0.0148; DLPFC: p = 0.111, LFC = 0.215; ERC: p = 0.00754, LFC = 0.295; HIPPO: p = 0.101, LFC = 0.293). WDR81 WD repeat domain 81. cMYO1C differential expression. MYO1C is significantly more expressed in AD cases than in unaffected controls within cortical brain regions (CRB: p = 0.826, LFC = 0.0299; DLPFC: p = 0.00836, LFC = 0.503; ERC: p = 0.0448, LFC = 0.382; HIPPO: p = 0.0324, LFC = 0.628). MYO1C myosin IC. d Greater DNAm at cg23703062 associates with reduced ANKRD30B expression in DLPFC, ERC, and HIPPO. Hypermethylation of DNAm site cg23703062, annotated to protein-coding gene ANKRD30B, associates with reduced gene expression in DLPFC, ERC, and HIPPO but not in CRB (CRB: p = 0.910, β = 0.00755; DLPFC: p = 0.0710, β = − 0.719; ERC: p = 0.0233, β = − 1.25; HIPPO: p = 0.101, β = − 0.792). ANKRD30B ankyrin repeat domain 30B

To further assess the functional link between DNAm and gene expression, we directly tested whether DNAm associates with gene expression for the subset of our samples that had both DNAm and RNA-seq data (Ndonors: 49 controls, 24 AD; Nsamples: 182 controls, 82 AD). We focused on the 407 genes with a nearby cross-region DMP and evidence of possible differential expression. We found DNAm associated with expression levels of 130 genes in at least one brain region (31.9%, at nominal p < 0.05, Table S12). This integrated approach refined our list of candidate genes implicated by a nearby differentially methylated site (within 10 kb), improving the biological resolution of our epigenome-wide scan (before: 39.1% of CpGs with multiple genes, after: 8.23% of CpGs with multiple genes; χ2 = 55.069, non-parametric p value = 1.6 × 10−8, 1 × 109 bootstraps). Overall, DNAm was inversely correlated with gene expression (79/130, 60.8%, p = 0.01406; using the strongest CpG-gene association), in line with the promoter-focused design of this microarray platform. Of note, hypermethylation of a site corresponding to ANKRD30B was associated with reduced gene expression in ERC, HIPPO, and DLPFC but not CRB (Fig. 3d). Other associations between DNAm probes and local expression of MYO1C (Fig. S12a) and DUSP22 (Fig. S12b) illustrate the heterogenous patterns of gene expression associated with DNAm and suggest additional brain region-specific factors contribute to gene expression regulation. Together, these findings further implicate DNAm as a potential mechanism underlying dysregulated gene expression in AD.

Discussion

DNA methylation (DNAm) is an epigenetic factor that is disrupted in Alzheimer’s disease (AD), and the effects of this epigenetic disruption have already been linked to several genes. To further validate previously reported genes and identify novel AD-relevant genes, we undertook a cross-brain region analysis of DNAm in Alzheimer’s disease (AD). Differential methylation analysis implicated biological processes previously hypothesized to underlie AD pathology—cell adhesion [39] and calcium ion homeostasis [41]—and were enriched for AD genetic risk loci [32]. Then, by linking DNAm data with corresponding transcriptome data, we were able to prioritize 130 genes that were differentially expressed between cases and controls and for which DNAm associated with expression. Together these findings further validate several previous, prominent DNAm associations such as those with ANK1 [12, 40] and DUSP22 [56] (Discussion S1), and implicate novel epigenetically dysregulated genes.

One promising gene implicated by our DNAm and gene expression evidence is ANKRD30B. Although little is known about the function of this protein-coding gene, it is expressed most strongly in breast, testis, and brain tissue. Furthermore, ANKRD30B is expressed in several cortical tissues affected by AD (DLPFC, ERC, HIPPO), and not cerebellum. Thus, while ANKRD30B is hypermethylated in all four brain regions, its expression is associated with DNAm and was reduced only in cortical regions and not cerebellum. The ANKRD30B protein product is predicted to contain an ankyrin repeat domain, a motif important for protein–protein interactions [27]. Interestingly, other genes previously implicated by DNAm evidence also encode proteins that contain ankyrin repeat domains (ANK1, ANKRD11).

Studying multiple brain regions is crucial to convert knowledge of epigenetic changes into insight of molecular risk mechanisms, because each region has a distinct cellular composition and regulatory landscape that contributes to neurophysiological function [11, 19, 25, 47, 57]. By studying the epigenome and transcriptome of four brain regions in parallel, three cortical regions susceptible to AD, and cerebellum, a region that is relatively protected, we were able to cast a wide net to capture AD-related differences. While we focused upon “cross-region” differential methylation—changes concordant across multiple brain regions—we also identified and characterized a large number of region-dependent differentially methylated sites. Moreover, even if differential methylation is shared across cerebellar and cortical brain regions, the gene expression correlates may not be. By studying the unique epigenetic profiles and their transcriptional correlates of multiple brain regions, we may glean deeper insight into the molecular mechanisms underlying differential susceptibility of brain regions to AD neuropathology. However, our study has several limitations. First and foremost, we cannot distinguish DNAm and gene expression differences that are causal in AD from those that relate to epiphenomena and secondary disease processes, such as neurodegeneration (Discussion S2). Indeed, the DNAm signature of other neurodegenerative diseases such as Lewy body dementia and Parkinson’s disease resembles that of Alzheimer’s disease at both the genome-wide [60] and single-gene level [63]. Nevertheless, some of the DNAm differences reported here may relate to AD etiology, because a subset of the differences was enriched within known late-onset AD genetic risk loci and correlated with gene expression changes. Further mechanistic studies to better untangle the timing of the co-occurring DNAm and expression changes are necessary to more confidently determine which of these DNAm changes are causative versus an epiphenomenon in the brains of patients with AD.

In a similar vein, these DNAm changes can be regulated by other components of the genome and epigenome. For example, DNAm differences in AD can reflect underlying genetic risk, as was the case with a differentially methylated region overlapping the PM20D1 promoter [59]. Likewise, DNAm changes may interact with other epigenetic factors that regulate gene expression [31]. Current human brain epigenome datasets now include post-translational histone modifications and higher order chromatin interactions, which have been shown to significantly alter gene expression both, within specific genetic loci and across the genome in human postmortem brain tissue [4, 50, 67]. Recent studies implicate widespread changes in the distribution of histone modifications in AD [16, 17]. While in some cases histone modifications may act independently of DNAm changes to generate AD risk [43], in other scenarios there is likely epigenetic cross-talk between DNAm and histone modifications [58]. Intriguingly, whereas a study of H4K16 acetylation, an active chromatin associated modification, identified AD-related differences weakly inversely associated with age-related changes [49], we observed the opposite correlation with DNAm, albeit the association was weak. The canonical roles of DNAm as a transcriptionally repressive epigenetic mark and histone acetylation as an activating one may reconcile these conflicting patterns. Disruption of multiple epigenetic factors may converge via regulating transcriptional networks [46, 48, 70] that are disrupted in AD. Our study suggests that disrupted DNAm plays a role within this complex gene regulatory framework. We identified 130 genes that had evidence of differential expression between cases and controls and for which expression associated with nearby DNAm that differed in AD. These differentially expressed genes may involve cell type-specific dysregulation. For instance, a recent study of cell-sorted cortical tissue found DNAm differences existed in both neurons and glia that associated with AD neuropathology (Braak stage) [15]. Likewise, a study of laser captured brain tissue suggests that microglia are responsible for differential expression of the epigenetically dysregulated gene, ANK1 [45]. Understanding the transcriptional output of these epigenetic changes in AD across multiple brain regions, which have different susceptibilities to AD neuropathology, will be a challenging, but rewarding endeavor. Our publicly available dataset contributes unique matched RNA-seq and DNAm data that can be used for development of genomic integration statistical methods.

The concordance between prior differential methylation studies and ours was modest (~ 28%), albeit similar to previous levels of concordance among epigenome-wide association studies. One factor potentially limiting concordance with prior studies is our use of neuropathological diagnosis as an outcome, as opposed Braak staging and CERAD scores. Additionally, our study used a modest number of AD cases (N = 24), as it included four brain regions and corresponding RNA-sequencing data. Aside from increasing sample sizes through additional data collection, jointly analyzing data from multiple existing studies via a meta-analysis framework may yield new, robust clues into the epigenetic fingerprint of AD. Meta-analyses of other genomic data types have offered valuable insights into the genetic [28, 32, 42] and transcriptional [51] correlates of Alzheimer’s disease. Likewise, a meta-analysis of DNAm studies would improve power to detect DNAm differences associated with AD and may also help reveal factors underlying study heterogeneity.

Although our study provides novel evidence supporting the role of DNAm in gene dysregulation, the extent to which DNAm contributes to AD pathogenesis remains unclear. While the DNAm array used here (HM450k) offers base-level resolution of DNAm differences, it is a biased sample of the human methylome and focuses on CpG DNAm in a promoter context. Indeed, DNAm in non-CpG contexts has been recently suggested to be more dynamic than DNAm in CpG contexts within human brain tissue and is likely relevant to neuropsychiatric diseases [52, 55]. Furthermore, we found that CpGs differentially methylated between AD cases and controls were preferentially located within regions of the genome sparsely sampled in the HM450k array. Thus, further studies with complementary methodologies (e.g., whole genome bisulfite sequencing) and meta-analysis of existing publicly available HM450k data will better resolve the relative contribution of DNAm to AD pathogenesis.

In summary, we used paired DNAm and transcriptome data from four brain regions to link methylation differences in AD to local gene dysregulation. These results support the role of DNAm as an epigenetic mechanism underlying gene dysregulation and support novel genes involved in AD. More generally, they illustrate the value of integrating epigenetic and transcriptomic data to study complex disease.

Methods

Postmortem brain tissue dissections

DLPFC (Brodmann areas 9 and 46), hippocampal formation, entorhinal cortex at the level of the anterior hippocampus and cerebellar cortex were dissected from frozen postmortem brains using a hand-held visually guided dental drill (Cat #UP500-UG33, Brasseler, Savannah, GA) as previously reported [37]. In addition to demographic matching between AD control subjects, the subjects included in the control group had no clinical or neurological history, or history of alcohol or substance abuse, or positive toxicology screens for illicit substances.

Alzheimer’s neuropathology diagnosis

All postmortem brains of subjects (controls and AD) have been sampled for neuropathology and specific Alzheimer’s lesions: beta-amyloid plaques, neurofibrillary tangles and tau-positive neurites. Samples were also genotyped for the apolipoprotein E gene (APOE). The sampled tissue sections were fixed in 10% buffered formalin and paraffin embedded into blocks for microscopic analysis (10 micron thickness). Sections included the superior frontal gyrus, middle and superior temporal gyri, inferior parietal cortex, occipital cortex, amygdala, hippocampus and entorhinal cortex, anterior thalamus, midbrain, pons, medulla, and cerebellum (including cerebellar cortex and deep cerebellar nuclei). Sections were silver-stained using the Hirano method [68] and immune stained using antibodies against ubiquitin, phosphorylated anti-tau (PHF-1) and beta-amyloid protein (6E10). Microscopic preparations were examined using conventional light microscopy. Alzheimer’s neuropathology ratings include the Braak [5] staging schema evaluating tau neurofibrillary tangle burden, and the CERAD scoring system as a measure of senile plaque burden (neuritic and diffuse). An Alzheimer’s likelihood diagnosis was then performed based on the published consensus recommendations for postmortem diagnosis of Alzheimer’s disease [23] as in prior publications [8].

Processing DNA methylation (DNAm) data

Human Methylation 450k (HM450k) arrays were run as specified by the manufacturer upon DNA extracted from each brain region. After generating array data for 398 postmortem brain tissue samples, resulting idat files were imported and then rigorously preprocessed with minfi [2]. We did not observe batch effects with several different quality control metrics. After removing low-quality samples (N = 5), we normalized the data with stratified quantile normalization [2]. We removed 64,660 probes that met one or more of the following exclusion criteria: (a) poor quality, (b) cross-reactive, (c) included common genetic variants, (d) mapped to a sex chromosome, and (e) did not map to hg38. We dropped 13 samples for one or more of the following reasons: (i) DNAm-predicted sex did not match phenotypic sex, (ii) 450k genotype clustered inappropriately, (iii) 450k genotype did not match SNP-chip genotype, and (iv) clustered inappropriately on principal component analysis. After these conservative quality control steps, 420,852 high-quality probes available for 377 samples remained for further analysis. Postmortem donors not meeting our diagnostic criterion for controls or AD were not included for analysis, leaving 269 tissue samples (Ndonors: 24 AD, 49 controls). We estimated DNAm cell-type compositions using a deconvolution algorithm [22] with a flow-sorted DLPFC reference [18] and estimated age acceleration with Horvath’s clock[21]. For additional details about DNAm processing, see Supplemental Methods (Figs. S13–S18).

Brain-region stratified analyses

We tested for differences between AD cases and controls in predicted neuronal proportion and age acceleration, stratified by brain region, with linear models. We included age, sex, and ancestry as covariates in these models. Likewise, we tested for case-control differences in RNA-seq estimated neuronal proportions using a seperate linear model for each brain region while adjusting for age, sex, race, mitochondrial mapping rate, gene assignment rate, and RIN.

Cross-region differential methylation analysis

In our “cross-region” model for AD case–control differences (N = 269 samples), we tested beta values from 420,852 probes for differential methylation between Alzheimer’s disease cases and controls using limma [54] with the model:

$${\text{DNAm}}_{ij} = \alpha_{i} + \beta_{i} \;{\text{Diagnosis}}_{j} + \theta_{i} \;{\text{Region}}_{j} + \delta_{i} \;{\text{Age}}_{j} \; + \; \zeta_{i} \;{\text{Sex}}_{j} + \iota_{i} \;{\text{MDS}}_{j} + \omega_{i} \;{\text{negControlPCs}}_{j} + \varepsilon_{ij}$$

for DNAm site i and donor j and binary disease status Diagnosisj adjusting for brain region, age, sex, the first multidimensional scaling (MDS) component of genotype data (ancestry), and the first two negative control principal components (PCs). Afterward, we tested an interaction model, i.e., an F test of the estimated AD effect from each brain region, while adjusting for the same covariates as above. Both of these models incorporated multiple brain regions from each donor, and thus repeated measures, so we used the generalized least squares technique to estimate statistical parameters to appropriately account for the repeated biological measures. We estimated a consensus correlation with limma’s duplicateCorrelation function in which donors are blocks and different brain regions are repeated observations. Then, we incorporated this consensus correlation while estimating statistics for each DNAm site using the generalized least squares technique via limma’s lmFit function. We performed sensitivity analyses to assess how robust our results were to potential confounds (see Supplemental Methods for additional details).

Differential methylated region analysis

To assess whether regions of the genome were differentially methylated between Alzheimer’s disease cases and controls, we used bumphunter with the same models as above [26]. We implemented the bumphunter method with the minfi Bioconductor package’s bumphunter function using 1000 bootstrap iterations, a cutoff of 0.05 for defining continuous probes, smoothing done with the locfitByCluster function, and otherwise default parameters [2]. We controlled for multiple testing using the family wise error rate (FWER) to determine statistical significance.

Processing RNA sequencing (RNA-seq) data

RNA sequencing was performed with protocols described at [7]. In summary, TruSeq Stranded Total RNA Library Preparation kit with Ribo-Zero ribosomal RNA depletion (Illumina) libraries were generated and sequenced on an Illumina HiSeq 2000 at the LIBD Sequencing Facility, producing a mean of 130.6 million 100-bp paired-end reads per sample. Raw sequencing reads were quality checked with FastQC [1]. Quality checked reads were mapped to the hg38/GRCh38 human reference genome with splice-aware aligner HISAT2 version 2.0.4 [30]. Gene-level quantification based on GENCODE release 25 (GRCh38.p7) annotation was run on aligned reads using featureCounts (subread version 1.5.0-p3) [34] with a mean 34.9% (SD = 4.6%) of mapped reads assigned to genes. Quality controls metrics such as gene assignment rate, mitochondrial mapping rate, and RNA integrity number (RIN) were significantly associated with brain region and case–control status (Fig. S19a–c). These quality control metrics were, therefore, included as covariates in downstream analyses (see below).

Differential expression of differentially methylated genes and association with DNAm

DNAm sites were annotated to all Gencode v25 genes within 10 kb upstream or downstream (20 kb region) using the GenomicRanges Bioconductor package [33]. These genes were then tested for corresponding AD-case–control differential gene expression. Analyses were stratified by brain region and used 196 control and 92 AD RNA-seq samples processed as described in the previous section. We normalized counts for 25,587 Gencode annotated genes that were expressed in at least one brain region with limma’s voom function then tested for AD case–control differences under an empirical Bayesian framework [54] while adjusting for RNA Integrity Number (RIN), age, sex, race, mitochondrial mapping rate, and gene assignment rate.

We tested then the association between DNAm at each differentially methylated site and log-transformed, normalized gene expression (log2[RPKM + 1]) for all Gencode (v25) annotated genes within 10 kb with at least partial evidence of differential expression (nominal p < 0.05 in at least one brain region). This analysis used 182 control and 82 AD samples with matching DNAm and RNA-seq data. We tested for these associations in R using a linear model (lm function) that adjusted for the covariates above and was stratified by brain region.