Introduction

Parkinson’s disease (PD) is characterized by resting tremor, rigidity, bradykinesia associated with loss of dopaminergic neurons in the substantia nigra (SN), and the neuronal presence of Lewy bodies. PD affects about 1% of people above the age of 60 years, making it the second most common neurodegenerative disease. Identification of genes for familial PD viz. SNCA, PARKIN, PINK1, DJ-1, VPS35, LRRK2, and TMEM230 have provided insights into disease pathogenesis (Hernandez et al. 2016, Deng et al. 2016). Recently, several large genome-wide association studies (GWAS) have been conducted to identify loci for sporadic PD. Yet majority of sporadic PD remains elusive with common variants explaining less a quarter of the PD heritability (Keller et al. 2012). This implies that genetic variants with modest effects on PD susceptibility remain to be discovered. GWAS is a powerful tool to map genes but is hampered by the large number of positive associations and multiple testing correction issues. Differentiating signal from noise by eliminating false-positive and negative associations is a huge challenge.

True genetic associations are accompanied by signals from surrounding markers due to linkage disequilibrium (LD). Recently, this observation was expanded upon by the development of a locus-based clustering algorithm for GWAS (Saeed, 2017). OASIS (Objective Assimilation of SNPs Interacting in Synchrony) is an algorithm that identifies and assimilates SNPs of even modest significance into clusters based on the LD principle (Saeed, 2017). Analogous to gene-based association tests, OASIS parses the GWAS data into physical proximity clusters. Yet OASIS is unique, in that it takes into account the entire association signal at each locus rather than assigning the most significant or a weighted p value to a gene (Saeed, 2017; Li et al. 2010). This enhances the possibility of reducing statistical skewing by low-frequency alleles and consequent false-positive associations (type I error) as well as false negatives due to stringent corrections such as Bonferroni (type II error). OASIS has previously been used to identify genes for lupus and diabetic nephropathy (Saeed 2017; Saeed 2018).

Here, OASIS was applied to two independent PD database of Genotypes and Phenotypes (dbGAP) GWAS datasets (Simón-Sánchez et al. 2009; Pankratz et al. 2009) in a meta-analysis to identify significant PD loci. Candidate genes in OASIS loci were further evaluated using gene-based testing on both PD datasets. Significantly associated genes were then tested for gene expression in brain tissue using GEO and BRAINEAC datasets. These composite genomic and functional analyses led to the identification of nine novel PD candidate genes.

Methods

Datasets

GWAS datasets were obtained online from the publicly available dbGAP repository. Meta-analysis of two PD datasets, pha002865 (db1) (Simón-Sánchez et al. 2009) and pha002868 (db2) (Pankratz et al. 2009) was conducted using OASIS. The screening dataset, db1, consisted of 1713 PD cases and 3978 controls genotyped for 463,185 single nucleotide polymorphisms (SNPs) (Simón-Sánchez et al. 2009). The second dataset, db2, consisted of 857 familial PD cases and 867 controls (Pankratz et al. 2009) genotyped for 318,551 SNPs. Most significant SNPs in overlapping OASIS loci in the two datasets (db1, db2) were tested for single-variant replication in three separate datasets. These replication datasets included the dbGAP dataset pha002840 (db3) consisting of 443 discordant PD sib pairs and genotyped for 174,945 SNPs (Maraganore et al. 2005), the PDGene meta-data (db4) (Lill et al. 2012), and the publically available screening dataset of a recent large GWAS meta-analysis (db5) (Chang et al. 2017). Figure S2 shows the flow chart for the study methodology.

Locus-based analysis

The OASIS algorithm and software have been described in detail previously (Saeed 2017). Briefly, the OASIS algorithm sums the −log [P] values of all significant SNPs (P ≤ 0.05) in a locus cluster of 200 kb to generate an OASIS score. This process is iterated across each GWAS dataset. The 200-kb cut off has previously been used to define a locus cluster and is the default value in the OASIS software, though it can be altered to yield a variety of analyses (Saeed 2017). OASIS calculates the 3-sigma (3σ; three standard deviations or a value ≥ 99.7% of the data) cut offs for the −log [P] values and the OASIS scores, structuring the GWAS meta-analysis data in to two axes (−log [P] and OASIS) and four groups viz quadrants A–D (Figure S1). Quadrant A loci crossed the 3σ cut offs on both axes; quadrant B loci only on −log [P] and quadrant C loci only on OASIS scores. Quadrant D loci failed to meet the 3σ cut offs on either axes. OASIS compiles the overlapping (within 2 Mb distance) 3σ significant regions in two GWAS datasets into a single table (Saeed 2017). The SNIPPER software was used to map genes on genome build hg19 to the OASIS loci. Positional information for each OASIS locus was used to define the chromosomal location for the gene search. The gene list for all OASIS loci was used as a “seed list” for subsequent gene-based analysis.

Gene-based testing

Candidate genes in the OASIS seed list were evaluated for gene-based association using GATES as implemented in the KGG software (Li et al. 2010). SNPs were mapped onto genes according to positional information from the NCBI GRCh37 database, and SNPs within 10 kb upstream and 10 kb downstream of each gene were included as well. This analysis is distinctly different from the locus-based clustering approach incorporated in OASIS (Saeed 2017; Li et al. 2010). Mostly the gene-set enrichment analysis (GSEA) algorithms map SNPs to genes, assign a weighted or the most significant P value to each gene, and then iteratively process the genes for association with disease (Li et al. 2010).

Gene expression analysis

Genomic convergence of results from multiple data types, such as genetic association and expression data, implies biological relationship of genes with disease. Gene expression in SN of PD patients (n = 16) versus controls (n = 9) was assessed using Gene Expression Omnibus (GEO) dataset GDS2821. The Affymetrix Human Genome U133 Plus 2.0 Array was used for gene expression analysis in this dataset. Genes that showed significant gene-wide association in GATES analysis were tested for expression differences in the GEO dataset. One-tailed Student’s t test of unequal variance was used to calculate statistical significance of expression levels for each gene between PD cases and controls. The genes that showed statistical significance were evaluated for gene expression in SN versus white matter (WM) in normal brains using the BRAINEAC datasets. BRAINEAC integrates whole genome genotype and transcript expression data from 134 normal human brain samples of 10 brain regions. Fold change was calculated as 2SN/2WM and the statistical significance of expression levels using the one-tailed Student’s t test of unequal variance. Expression quantitative trait locus (eQTL) analysis was done using BRAINEAC for all significant SNPs in db1 and db2, at locus 22 containing the gene, AXIN1. Corresponding LD data was obtained from LDlink.

Protein network analysis

All genes that were significantly overexpressed or repressed in PD SN in the GEO dataset GDS2821 were used as input for STRING. To test the interaction of these candidate genes with known PD pathways, familial PD genes were also entered (PARKIN, PINK1, DJ-1, VPS35, LRRK2, TMEM230, SNCA, MAPT, ATP13A2, FBXO7, and PLA2GB). Gene pairs that were either co-expressed or involved in experimentally validated protein–protein interactions with at least a low score (0.15) were considered.

Results

Initial OASIS analyses of db1 and db2 led to the identification of 289 loci that were significant (crossed the 3-sigma cut off) in these datasets (Fig. 1). Of these, 36 loci replicated in both datasets which were found to overlap into 24 unique loci (Table S1). The highest association signal was for rs2736990 (P = 5.7 × 10−9) located at 4q22 containing SCNA and rs12185268 (P = 1.9 × 10−7) at 17q21 containing MAPT. This was followed by rs11648673 (P = 4.8 × 10−7) at a novel locus 16p13 (Table S1). All three loci were found in quadrant A suggestive of highly significant association with PD. In contrast, the known PD locus 4p16 containing GAK/DGKQ/TMEM175 was found in quadrant B (rs11248060; P = 3.4 × 10−6). Two further loci, 8p23 (locus 15) and 14q22 (locus 21), were also found in quadrant A in at least one dataset (Table S1).

Fig. 1
figure 1

a (db1), b (db2) OASIS genome-wide association in the two PD datasets. The −log [P] values for variants across the genome according to their categorization in quadrants (A, B, or C) in the two PD datasets (db1, db2) are shown. This is based on the intersection of −log [P] and OASIS score for each variant

Positional mapping using SNIPPER generated a list of 379 genes in these 24 overlapping OASIS loci (Table S2). These genes were used as a seed list for KGG gene-wide association analysis of db1 and db2, which identified 88 genes that were significantly associated with PD in at least one dataset (Table S1). Genes in three OASIS loci (6, 22, and 23) crossed the Benjamini and Hochberg (BH) correction in db1. SCNA (P = 1.0 × 10−7; PBH = 0.0012) in locus 6 (4q22), AXIN1 (P = 1.0 × 10−5; PBH = 0.0176) in locus 22 (16p13), and several genes in locus 23 (17q21) including MAPT (P = 2.1 × 10−6; PBH = 0.0048) were significant (Table S3). No gene crossed the BH correction in db 2 (Table S4).

KGG analysis identified 57 genes in db1 and 52 genes in db2 which were nominally significant (Tables S3 and S4). Of these, 88 genes were unique and 21 overlapped between the two datasets. db1 SCNA in locus 6 and in db2 DGKQ (P = 6.78 × 10−6) in locus 4 (4p16) had the highest gene-wide significance. AXIN1 was significant in db1 but not in db2. The reason is that in OASIS analysis locus 22 containing AXIN1 was found in quadrant A in db1, whereas it was found in quadrant C in db2. As indicated earlier, OASIS analysis is completely different from GATES analysis incorporated in KGG as well as other GSEA algorithms. OASIS has been shown to identify known disease susceptibility genes in quadrant C while they were missed by standard analysis (Saeed 2017).

Single-variant replication of SNPs with the highest −log [P] values (Max_SNP) in OASIS identified loci was carried out in db3, db4, and db5. In db4, 24 SNPs in nine OASIS loci replicated (Table S5), whereas in db3 and db5, SNPs in only three loci (4, 6, and 11) could be validated (Table S5). PDGene (db4) is a much larger database and all 60 OASIS identified Max_SNPs were found, whereas only 15 SNPs in db3 and 9 SNPs in db5 were found. Significantly replicated SNPs mapped on to known PD loci including SCNA on 4q22, DGKQ/TMEM175 on 4p16, and MAPT locus on 17q21. Max_SNPs in locus 22 (16p13) containing AXIN1 were not found to be associated with PD in the replication datasets indicating that this and 14 other OASIS loci could potentially harbor novel PD genes.

Using GEO dataset GDS2821, the KGG identified genes (n = 88) were further investigated for gene expression changes in SN of PD patients and controls. Of these, 21 genes were found to be nominally significant and 4 genes crossed the false discovery rate (FDR) (Table 1). Two-thirds of these genes were found in three known PD loci (4p16, 4q22, and 17q21) and one novel locus on 16p13. The most significant change in gene expression in PD SN was for AXIN1 located in 16p13. In normal brains, AXIN1 was repressed in SN compared to WM (4.55 × 10−19); however, in PD brains, it was upregulated in SN compared to SN in normal brains (2.97 × 10−5; FDR = 2.35 × 10−3) (Table 1). CLDN1 was 2.3-fold upregulated in SN of normal brains compared to WM, but significantly repressed in PD SN (Table 1). ZNF141 and ZNF721 normally had lower expression in SN but were upregulated in PD (Table 1).

Table 1 Genomic convergence of gene-wide association and gene expression analysis of genes in OASIS loci

Protein interactions of genes in Table 1 were tested with familial PD genes (Fig. 2). There was experimentally determined interaction between LRRK2 and AXIN1, ZNF141 and ZNF721. RHOT2, also located in 16p13 and repressed in PD SN (Table 1), significantly interacted with PARK2 and PINK1. RICTOR interacted with PINK1 and was co-expressed (0.221) with LRRK2. RICTOR, located on 5p13, is normally repressed in SN but overexpressed in PD (Table 1).

Fig. 2
figure 2

STRING protein–protein interactions of PD candidate genes. The interactions of proteins, whose genes had significantly altered expression in PD SN, with familial PD proteins are shown. The gene names are shown and thickness of the connecting lines represents the strength of interaction. The numbers represent STRING calculated strength of experimentally determined interaction between a protein pair

To identify the variant(s) modulating AXIN1 expression, eQTL analysis was performed on all significant db1 and db2 SNPs (Table S6). The most significant eQTL SNPs reported by BRAINEAC were also included. Variants in AXIN1 predominantly modulated its expression in the basal ganglia (Table S6). The Max_SNP at locus 22, rs11648673 (P = 4.77 × 10−7) was not an eQTL for AXIN1. It is in strong LD with rs11644916 (r2 = 0.892) which also showed minimally significant eQTL association (P = 0.043). SNPs 17–22 (Table S6) are in strong LD with each other (r2 > 0.98) and though they were reported as the top eQTL in BRAINEAC datasets for AXIN1, the lack of association of three of these SNPs in db1 and db2 make them unlikely candidates in PD (Table S6). The intronic SNP rs13337493 (eQTL P = 1.4 × 10−4) significantly modulated AXIN1 expression in SN; however, it was not genotyped in either db1 or db2. This SNP is in LD with rs758033 (r2 = 0.962), rs2361988 (r2 = 0.962) which showed significant associations in db1 (Table S6).

Discussion

In this study, four major loci for PD were identified, of which three were previously known (4q22, 17q21, and 4p16). The novel PD locus 16p13 harbors the gene AXIN1 which is a critical component of Wnt/β-catenin signaling and functions as its negative regulator, inducing apoptosis (Clevers and Nusse 2012). OASIS showed significant association of the 16p13 locus with both sporadic and familial PD (db1 and db2). Moreover, the strength of association was nearly as much as the MAPT locus on 17q21 and higher than the TMEM175 locus, 4p16. AXIN1 was the most significant gene at the 16p13 locus on GATES analysis. Gene expression in SN of PD patients was found to be most significantly altered for AXIN1 from amongst 88 PD associated genes across the genome. AXIN1 was also shown to have a modest protein interaction with LRRK2. These findings demonstrate a significant genetic association of AXIN1 with PD and implicate it in its pathobiology.

Wnt signaling is mediated by β-catenin, a nuclear factor that promotes cell cycling (Clevers and Nusse 2012). Concentration of β-catenin is controlled by a degradation complex whose rate-limiting component is Axin (coded by AXIN1) (Clevers and Nusse 2012). Wnt/β-catenin signaling has been shown to play a crucial role in the neurogenesis of the of midbrain dopaminergic neurons (L’Episcopo et al. 2014). Moreover, β-catenin accumulates specifically in mature thalamic neurons and is an intrinsic feature of these cells (Misztal et al. 2011). Normally, Axin levels and that of other components of the β-catenin degradation complex are low in the thalamic neurons which allows the nuclear localization of β-catenin, independent of Wnt signaling (Misztal et al. 2011). Axin is inhibited by tankyrases whose inhibitors, by stabilizing Axin, promote apoptosis (Huang et al. 2009). Similarly, overexpression of Axin induces β-catenin degradation (Nakamura et al. 1998). It can therefore be postulated that upregulation of AXIN1 expression in SN of PD patients, as shown here, may be involved in neurodegeneration typical of PD.

SCNA was the obvious gene at 4q22 (Table S1); however, the association signal was more widespread at the 4p16 and 17q21 loci. Consequently, the specific gene at the 4p16 locus is as yet unclear. At the 17q21 locus, MAPT is a known PD gene; however, several nearby genes have consistently demonstrated association and it is possible that this locus may harbor additional PD susceptibility genes (Lill et al. 2012; Nagle et al. 2016). Here, two genes at each of these loci showed altered gene expression. At 4p16, ZNF141 and ZNF721, though modestly associated with PD in GATES analysis, showed significant upregulation of expression in SN of PD patients (Table 1). It is possible that rare variants in one or both of these genes are linked to common variants at this locus generating the association signal in GWAS. Deep sequencing of ZNF141 and ZNF721 together with other candidate genes at 4p16 (Table S1), will provide the definite answer to this intriguing association signal. Interestingly, zinc finger protein genes, which function as transcriptional regulators, have been previously associated with PD in GWAS (Chang et al. 2017). At 17q21, KANSL1 and ARHGAP27 showed significant gene-wide association (Tables S3 and S4) as well as were upregulated in PD SN (Table 1). ARHGAP27 is involved in the clathrin-mediated endocytosis of receptor tyrosine kinases (Sakakibara et al. 2004). KANSL1 is involved in cell cycling by microtubule stabilization (Meunier et al. 2015) and its haploinsufficiency causes intellectual disability (Zollino et al. 2012). Of the six consistently associated genes at this locus (Chang et al. 2017), KANSL1 and ARHGAP27 appear to be strong candidates for further investigation including sequencing and biological studies.

Other notable loci that this study identified include 8p23 (locus 15) and 14q22 (locus 21), both of which were found in quadrant A in at least one dataset (Table S1). Only two genes at 8p23 were significant on GATES analysis (Table S1). ARHGEF10 is involved in cell cycling and a mutation has been shown to result in slowed nerve conduction velocity (Verhoeven et al. 2003; Aoki et al. 2009). The other candidate gene at 8p23 was CSMD1 (gene-wide P = 0.003). CSMD1 is involved in complement activation and inflammation in the developing CNS (Kraus et al. 2006). Using whole exome sequencing, CSMD1 was recently shown to harbor mutations for familial PD (Ruiz-Martínez et al. 2017). This major finding helps validate the methodology described in the present study of composite analysis with OASIS and GATES to map disease susceptibility genes in GWAS.

On 14q22, an interesting candidate gene is WDHD1 (Table 1), a nucleoplasmic DNA-binding protein that influences cell cycling as it is a key regulator of centromere function (Hsieh et al. 2011). CLDN1 at 3q28 (Table 1) is involved in formation of tight junctions during neuroinflammation (Horng et al. 2017). CLDN1 repression in PD (Table 1) is suggestive of an increased propensity for inflammation which may help the spread of PD pathology (Sampson et al. 2016). RHOT2 is a critical component of the mitochondrial trafficking complex together with Pink1 (Fig. 2) (Wang et al. 2011). Overexpression of RHOT2 increases Pink1 mitochondrial localization preventing mitochondrial dysfunction (Weihofen et al. 2009). Here, RHOT2 expression was decreased in PD SN (Table 1). Though the gene-wide and expression associations are modest, RHOT2 should be considered an important PD candidate gene. Pink1 is not only cytoprotective in the mitochondria but also in the cytosol where it induces Akt phosphorylation that is mediated by Rictor (encoded by RICTOR) (Murata et al. 2011). Therefore, the locus 5p13 containing the gene RICTOR (Table S1) is another interesting candidate locus for PD identified in this study. Further studies are needed to experimentally determine the causal genes in the PD loci described here (Table S1). This study, however, provides a paradigm for the design of such functional studies. MAPT expression remained unaltered between PD SN and controls (data not shown), indicating that expression analysis may not suffice as the only type of functional study for genes identified by composite analysis described here.

In summary, composite analysis with OASIS, a locus-based clustering algorithm and gene-based association testing with GATES, on existing GWAS datasets combined with expression studies has led to the identification of several novel PD candidate genes and loci. This gene-mapping strategy has immense potential to mine genes of low to moderate effect sizes in GWAS for complex diseases such as PD (Saeed 2017; Saeed 2018). The genes thus identified can be subsequently validated by deep sequencing and functional studies including expression in affected tissues. Convergence of genomic and functional data then provides robust evidence for disease susceptibility. The most noteworthy novel PD associated gene identified here is AXIN1 located at 16p13. Based on the eQTL data the SNPs rs13337493, rs758033, and rs2361988 seem to modulate AXIN1 expression in the basal ganglia. This finding needs further investigation using comprehensive AXIN1 genotyping and eQTL studies. It is also possible that a haplotype modulates AXIN1 expression in SN rather than a single variant, as in the case of APOL1 (Larsen et al., 2013). Other important PD candidate genes that this study also identified include CLDN1 at 3q28, ZNF141 and ZNF721 at 4p16, RICTOR at 5p13, CSMD1 at 8q23 recently shown to be a familial PD gene, RHOT2 also located at 16p13, and KANSL1 and ARHGAP27 located at 17q21. Biological studies including proteomic, immunohistopathologic, in vitro, and in vivo studies would help confirm the pathogenic relevance of these genes and may provide therapeutic targets for PD modulation.