Introduction

Alzheimer’s disease (AD) is the most common neurodegenerative disease in the elderly [1]. Recently, genome-wide association studies (GWAS) have been used to investigate AD pathogenesis. GWAS have yielded important new insights into the genetic mechanisms of AD [2]. However, newly identified AD susceptibility loci exert very small risk effects and cannot fully explain the underlying genetic risk [3]. A large proportion of AD heritability has yet to be explained. Fortunately, the existing large-scale GWAS datasets provide strong support for the investigation of AD mechanisms using pathway analysis methods [49].

Lambert et al. [4] used two different pathway analysis tools to analyze a large GWAS in a French population (French GWAS). They identified significant pathways related to the immune system [4]. Jones et al. [6] performed multiple pathway analyses of two large GWAS datasets including the French GWAS and the Genetic and Environmental Risk for Alzheimer’s Disease Consortium (GERAD) GWAS. They observed significantly enriched pathways related to metabolism and the immune system [6]. We performed multiple pathway analyses of two publicly available AD GWAS datasets (French and Pfizer GWAS datasets) [7]. We identified cell adhesion molecules (CAM) as a consistent signal in AD. Recently, Ramanan et al. [8] performed a pathway enrichment analysis on GWAS data from the 742 Alzheimer’s Disease Neuroimaging Initiative (ADNI) participants. They confirmed the involvement of CAM in AD (P = 5.60E−04) [8].

Based on the above findings, we think that CAM may be a consistent signal in multiple AD GWAS. However, it was unclear whether CAM involvement was present in the GERAD GWAS dataset. Evidence also shows that genetic variants that modify gene expression in the brain may influence AD risk [10]. In order to verify CAM to be a consistent signal in multiple AD GWAS, we conducted a pathway analysis of the GERAD GWAS dataset and brain expression GWAS dataset. Meanwhile, we think that integrating the AD GWAS and AD brain expression datasets may provide complementary information to identify important pathways involved in AD. Here, we conducted a systems analysis using (1) KEGG pathways, (2) large-scale AD GWAS from GERAD (n = 11,789), (3) two brain expression GWAS datasets (n = 399) using quantitative expression trait loci from the human cerebellum and temporal cortex, and (4) previous results from pathway analysis of AD GWAS.

Materials and Methods

AD GWAS Dataset

The GERAD GWAS dataset included 11,789 samples from individuals of European ancestry (3941 AD cases and 7848 controls) [11]. A total of 529,205 autosomal single nucleotide polymorphisms (SNPs) passed quality control checks. SNPs were tested for association with AD using logistic regression under an additive model. Here, we selected 761 significant SNPs with P ≤ 1.00E−03. For more detailed information, please refer to the original study [11].

Two AD Brain Expression GWAS Datasets

The brain expression GWAS datasets were originally analyzed by Zou et al. [10]. A total of 773 brain samples from the cerebellum and temporal cortex were available for analysis. The samples were divided into four datasets: 177 non-AD cerebellar samples, 197 non-AD temporal cortex samples, 197 AD cerebellar case samples, and 202 AD temporal cortex case samples. Zou et al. analyzed 213,528 cisSNPs within ±100 kb of the 24,526 tested transcripts. Levels of 24,526 transcripts for 18,401 genes were measured using WG-DASL assays. False discovery rate (FDR)-based P values were used to correct for multiple testing. More detailed results are described in Supplementary Tables [10].

Previous Results from Pathway Analysis of AD GWAS

Prior to this study, six pathway analyses of AD GWAS have been reported with five studies using the KEGG database [49]. We compared our findings with the previous pathway analyses of AD GWAS. All of the pathway analysis results were publicly available from the original studies [49].

AD GWAS Dataset Preprocessing

ProxyGeneLD was used to assign SNPs to specific genes [12]. This software flexibly takes into consideration the complex linkage disequilibrium (LD) patterns in the human genome and corrects for the inflation of significance caused by gene length. In brief, ProxyGeneLD begins with the retrieval of LD structures in the HapMap genotyping data (Utah residents with ancestry from northern and western Europe (CEU) samples of HapMap phase II, release 22) [12]. If a group of markers is in high LD in HapMap (r 2 > 0.8), they are tied to a “proxy cluster” and taken as a single signal [5]. Next, each marker in the AD GWAS with statistically significant evidence of association is evaluated to see whether (a) it belongs to any proxy cluster and (b) whether the marker itself or any marker in the cluster is located in a genetic region [5]. If a marker or cluster overlaps a region extending across a gene, then it is assigned as showing possible association with that gene. For more detailed algorithms, please refer to the original study [12].

AD Brain Expression GWAS Datasets Preprocessing

Here, we selected the 197 AD cerebellar case samples and 202 AD temporal cortex case samples from Zou et al. [10]. For each dataset, we selected cisSNPs associated with gene expression with P < 1.0E−4 and with AD risk with P < 1.0E−3 in the meta-analysis of the Alzheimer’s Disease Genetic Consortium (ADGC) dataset [10]. In the end, we selected 3660 cisSNPs associated with the expression of 298 genes in the cerebellum and 3659 cisSNPs associated with the expression of 298 genes in the temporal cortex. More detailed results are described in Supplementary Tables from the original study [10].

Pathway-Based Testing for AD Genes

The KEGG pathways in WebGestalt were used in our research (May 10, 2014) [13]. For a given pathway, a hypergeometric test was used to detect the overrepresentation of AD-related genes among all of the genes in the pathway [13]. The P value of K AD-related genes in the pathway was calculated using

$$ P=1-{\displaystyle \sum_{i=0}^K\frac{\left(\begin{array}{c}\hfill S\hfill \\ {}\hfill i\hfill \end{array}\right)\left(\begin{array}{c}\hfill N-S\hfill \\ {}\hfill m-i\hfill \end{array}\right)}{\left(\begin{array}{c}\hfill N\hfill \\ {}\hfill m\hfill \end{array}\right)}} $$

where N is the total number of genes of interest, S is the number of all of the AD-related genes, m is the number of genes in the pathway, and K is the number of AD-related genes in the pathway. The FDR method was used to correct for multiple testing. Any pathway with an adjusted P < 0.05 and at least five upregulated or downregulated AD genes was considered significant. In order to reduce the multiple-testing issue and to avoid testing overly narrow or broad pathways, we selected pathways that contained at least 20 and at most 300 genes for subsequent analysis.

Pathway-Based Meta-analysis of AD GWAS and Brain Expression GWAS

We used Fisher’s method to combine the P values for each pathway identified in AD GWAS and AD brain expression GWAS [14]. For a given pathway, the formula for the statistic is

$$ \begin{array}{l}{x}^2=-2{\displaystyle \sum_{i=\kern0.24em 1}^k \ln \left({p}_i\right)}\\ {}\end{array} $$

where p i is the P value of the pathway in the ith study and k is the total number of studies. x 2 follows a chi-square distribution with 2k degrees of freedom [14]. The pathway-based meta-analysis was carried out using the program R (http://www.r-project.org/).

Results

Pathway Analysis of GERAD GWAS

Using the AD GWAS, we found 320 AD genes using ProxyGeneLD. After FDR correction for multiple testing, we identified 14 significant KEGG pathways (P < 0.05) with at least five AD genes, among which CAM (hsa04514) was the most significant pathway (Table 1). Nine of the 133 genes in the CAM pathway were identified. We also observed a significant overrepresentation of KEGG pathways related to metabolism (metabolic pathways and purine metabolism), cardiovascular diseases (dilated cardiomyopathy, arrhythmogenic right ventricular cardiomyopathy (ARVC), and hypertrophic cardiomyopathy (HCM)), and neurological disorders (Alzheimer’s disease and Huntington’s disease) (Table 1).

Table 1 Significant KEGG pathways with P < 0.05 in the AD GWAS dataset

Pathway Analysis of Brain Expression GWAS Datasets

We got 425 AD genes regulated by cisSNPs in the cerebellar or temporal cortex. After FDR correction, we identified 20 significant KEGG pathways (P < 0.05), which included at least five AD genes (Table 2). We found a significant overrepresentation of KEGG pathways related to the immune system and diseases, neurodegenerative diseases and cardiovascular diseases, metabolism, and genetic and environmental information processing. The antigen processing and presentation and CAM pathways were the most and the second significant signals (Table 2).

Table 2 Significant KEGG pathways with P < 0.05 in AD brain expression GWAS datasets

Pathway-Based Meta-analysis of AD GWAS and Brain Expression GWAS

We combined the findings from the AD GWAS and the AD brain expression datasets. Using pathway-based meta-analysis and FDR correction for multiple testing, we identified CAM (hsa04514) and antigen processing and presentation to be the most and the second significant signals (Table 3). Here, we list the top 20 significant pathways (Table 3). Based on the KEGG classifications, these pathways can be divided into six main classes: immune system and diseases (n = 8), cardiovascular disease (n = 4), environmental information processing (n = 2), metabolism (n = 2), neurodegenerative system and disease (n = 2), cellular processes (n = 1), and infectious diseases (n = 1) (Table 3).

Table 3 Top 20 significant KEGG pathways by meta-analysis of AD GWAS and AD brain expression GWAS datasets

Comparison with Previous Studies Using Single GERAD GWAS and Other GWAS

Jones et al. [6] analyzed the GERAD GWAS and reported six immune system-related pathways including asthma (hsa05310), hematopoietic cell lineage (hsa04640), graft-versus-host disease (hsa05332), allograft rejection (hsa05330), autoimmune thyroid disease (hsa05320), and type I diabetes mellitus (hsa04940). Here, we confirmed these six pathways, which were all ranked to be top 20 significant signals (Table 3). Meanwhile, we identified another two immune pathways including antigen processing and presentation (hsa04612) and systemic lupus erythematosus (hsa05322). Previous pathway analysis using French AD GWAS reported weak association between antigen processing and presentation (hsa04612) and AD with P = 0.02 [4]. Here, we identified it to be the second and significant signals with P = 1.87E−07 (Table 3).

We previously integrated three previous large-scale AD GWAS including GERAD GWAS using a gene-based meta-analysis and subsequently conducted a pathway analysis [9]. We highlighted, for the first time, the involvement of cardiovascular disease-related pathways in AD [9]. There are four pathways related to cardiovascular disease in the KEGG database including viral myocarditis (hsa05416), dilated cardiomyopathy (hsa05414), hypertrophic cardiomyopathy (hsa05410), and arrhythmogenic right ventricular cardiomyopathy (hsa05412). In our research, we identified all the four pathways to be significantly associated with AD, among which viral myocarditis was the third significant signal with P = 7.79E−07 (Table 3). After careful comparison with previous pathway analyses of AD GWAS [49], we reported, for the first time, the involvement of the purine metabolism (hsa00230) pathway in AD, which is the eighth significant signal in Table 3. More detailed comparison results are described in Table 3.

Discussion

Recently, multiple AD GWAS have been conducted. In this research, we consider CAM to be a consistent signal in multiple AD GWAS. We analyzed the GERAD GWAS and AD brain expression GWAS. We identified the involvement of CAM in AD by pathway analysis of the AD GWAS and AD brain expression GWAS. Based on previous and our results from pathway analysis of AD GWAS, CAM is a consistent signal in the French [4], Pfizer [4], ADNI [8], GERAD [11], and AD brain expression GWAS [10]. Meanwhile, we combined the findings from the AD GWAS and the AD brain expression GWAS datasets by a pathway-based meta-analysis method. We further identified CAM (hsa04514) to be the most significant signal (Table 3). In addition to the CAM, we identified significant pathways related to immune system and diseases, cardiovascular disease, metabolism, neurodegenerative system and disease, cellular processes, and infectious diseases. We confirmed previous findings and highlighted the purine metabolism pathway (hsa00230) in AD for the first time, which is the eighth significant signal in Table 3.

Until now, the exact pathogenetic role of purine metabolism in AD is still unknown. Kaddurah-Daouk et al. [15] used targeted metabolomics platform to profile cerebrospinal fluid from 40 AD, 36 mild cognitive impairment (MCI), and 38 control subjects. Levels of 71 metabolites, including 24 known compounds quantified by the liquid chromatography electrochemical array (LCECA) platform, were measured. There are six known compounds in purine metabolism including guanosine, hypoxanthine, uric acid, xanthine, xanthosine, and paraxanthine. The results showed that AD subjects had elevated xanthosine versus controls [15]. MCI subjects had elevated hypoxanthine and uric acid versus controls. Metabolite ratios revealed changes of uric acid/xanthine, xanthine/hypoxanthine, and xanthine/xanthosine within the purine pathway. A partial correlation network showed total tau most directly related to purine pathway [15]. These findings indicate that AD is associated with an overlapping pattern of perturbations in the purine pathway [15].

In addition to the AD, the involvement of purine metabolism in other neurodegenerative diseases has been reported. Johansen et al. [16] compared plasma profiles from people with idiopathic Parkinson’s disease (PD) and those whose disease was due to a mutation in LRRK2, as well as nonrelated control subjects. Although the two PD categories shared much in common, the profile of a dozen mostly unknown metabolites was sufficient to distinguish them. Levels and ratios of some purine metabolites, such as uric acid, hypoxanthine, and xanthine, were significantly decreased in both kinds of PD patients, when compared to control subjects [16].

LeWitt et al. [17] analyzed cerebrospinal fluid (CSF) concentrations of homovanillic acid (the major catabolite of dopamine) and the purine compound xanthine for a comparison of 217 unmedicated PD subjects and 26 healthy controls. The xanthine/homovanillic acid ratio is different between PD cases and controls. The mean xanthine/homovanillic acid quotient from controls was 13.1 ± 5.5 as compared to the PD value of 17.4 ± 6.7 at an initial lumbar CSF collection (P = 0.0017), and 19.7 ± 8.7 (P < 0.001) at a second CSF collection up to 24 months later. These observations further provide neurochemical evidence that links purine metabolism to PD [17].

Here, a hypergeometric test was used for pathway analysis. We set a high threshold (P ≤ 1.00E−03) for the inclusion of SNPs or cisSNPs, similar to Lambert and Jones et al. [4, 6]. Evidence suggests that using a stringent P value (P ≤ 1.00E−03) to define associated SNPs will accurately test whether highly associated SNPs are enriched in a pathway [18]. Jia et al. [19] suggested that a hypergeometric test performs better and with a higher power than gene set enrichment analysis (GSEA) or a SNP ratio test (SRT) for gene sets consisting of markers that are highly associated with the disease (P ≤ 1.00E−03).

In this study, we used the pathway-based meta-analysis method to integrate AD GWAS and expression datasets, as this method is widely used in previous studies. Arasappan et al. [20] applied a pathway-based meta-analysis into four independent gene expression datasets to identify gene expression signatures for systemic lupus erythematosus, and identified a 37-gene expression signature for systemic lupus erythematosus in human peripheral blood mononuclear cells. In order to combine the results from different analyses, Kaever et al. [21] introduce a methodical framework for the meta-analysis of P values obtained from pathway enrichment analysis (set enrichment analysis based on pathways) of multiple dependent or independent data sets from different omics platforms. Shen et al. [22] proposed two approaches of meta-analysis for pathway enrichment by combining statistical significance across studies at the gene level or at the pathway level. They applied these methods into real data on drug response of breast cancer cell lines and lung cancer tissues [22].

Here, we selected the pathways from the KEGG database, but not the GO database, for pathway analysis based on the following considerations. KEGG database is manually compiled on the basis of biological evidence and does not have a hierarchical structure [4, 6]. The GO database is based mainly on computer predictions as well as human annotation [4, 6]. The GO database has a hierarchical structure. GO analysis typically assumes that each functional category is independent, and less than 1 % of the GO annotations have been confirmed experimentally [4, 6].

Despite these interesting results, we recognize some limitations in our study. First, we selected expression datasets from the human cerebellum and temporal cortex. We will further verify our findings using expression data from other human brain tissues such as the frontal cortex, hippocampus, prefrontal cortex, and visual cortex in subsequent studies. Second, for many long genes in the CAM pathway, multiple-testing corrections may not be sufficient to account for all biases. The results from the GERAD GWAS should be adjusted using a permutation test. However, the original SNP genotype data are not available to us, so future replication studies using genotype data are required to replicate our findings.

Collectively, our integrated analysis shows that (1) CAM is a consistent signal in five AD GWAS, (2) CAM is the most significant signal in AD, (3) we confirmed previous findings, and (4) we highlighted the purine metabolism pathway (hsa00230) in AD for the first time. We believe that our results may advance our understanding of AD mechanisms and will be very informative for future genetic studies in AD.