Introduction

Amyotrophic lateral sclerosis (ALS) is the most common neurodegenerative disease, which mainly affects the motor neurons controlling voluntary muscles. ALS leads to muscle stiffness, twitching, and progressive weakness (Bonifacino et al. 2016). Most serious ALS patients will die of respiratory failure or pneumonia. Although ALS can occur at any time during adulthood, it usually affects people in the mid-fifties (Eisen 2009). It was estimated that the incidence of ALS was 1.2–4 cases per 100,000 people per year in developed countries (Gordon et al. 2013). Most patients die in 2–5 years after affecting ALS (Xiao et al. 2016).

It has been demonstrated that genetic factors play an important role in the pathogenesis of ALS. It is estimated that these candidate genes account for 25–30% of familial ALS (Orr 2011). A group of susceptibility genes or loci have been identified for ALS, such as SOD1 (Rosen 1993), FUS (Vance et al. 2009), OPTN (Maruyama et al. 2010), VCP (Johnson et al. 2010), and C9ORF72 (DeJesus-Hernandez et al. 2011; Majounie et al. 2012; Renton et al. 2011). Recently, a large genome-wide association study (GWAS) of ALS reported MOBP and SCFD1 as the risk loci of ALS, and verified that ALS was a complex genetic trait with a polygenic architecture (van Rheenen et al. 2016). However, the genetic risks explained by the identified loci were limited, suggesting the existence of additional genetic factors implicated in the development of ALS.

Genome-wide association studies (GWAS) are a powerful approach and achieve great success for susceptibility gene mapping of complex diseases/traits, including ALS. However, GWAS have several limitations. For instance, due to strict statistical significant threshold, GWAS usually focus on a few of loci with the most significant association signals. Without considering the joint effects of multiple functionally related genes, GWAS have limited power to detect the causal loci with moderate or weak genetic effects. Because most of genes do not work individually and complex diseases are generally determined by complicated biological processes, identifying several disease-associated genes is often insufficient to reveal the genetic architecture of complex diseases. Inspired by the gene set enrichment analysis (GSEA) of microarray data, GWAS-based pathway association studies were proposed (Wang et al. 2007). By integrative analysis of summary data from GWAS and known biological pathways, pathway association studies can provide more pathogenetic information considering the joint biological effects of multiple functionally related genes (Wang et al. 2010).

Expression quantitative trait loci (eQTLs) are genomic loci that can affect gene expression levels. Recent studies have confirmed the important roles of eQTLs in the pathogenesis of human complex diseases (Ertekin-Taner 2011; Gibson et al. 2012; Murphy et al. 2010; Nicolae et al. 2010). The disease-associated SNPs identified by GWAS are also significantly enriched in eQTLs (Nicolae et al. 2010). However, because mostly locating outside of genes, disease-associated eQTLs were easily to be ignored by previous GWAS. Recently, a new method named summary data-based Mendelian randomization (SMR) was proposed, which was capable of integrating GWAS summaries and eQTL annotation information to detect novel genes, whose expression levels were associated with complex diseases (Zhu et al. 2016). SMR showed a high power for identifying novel causal genes of complex diseases (Zhu et al. 2016).

In this study, utilizing the latest published ALS GWAS and eQTL data, SMR was first applied for single gene expression association analysis. To reveal the biological significances of single gene expression association analysis results, we further extended SMR to pathway association analysis. The SMR single gene analysis results were subjected to GSEA for identifying novel ALS-associated pathways with known function.

Materials and Methods

GWAS Summary Data of ALS

A large-scale GWAS meta-analysis of ALS was used in this study (van Rheenen et al. 2016). Briefly, this GWAS meta-analysis consists of two published GWAS, including a recent GWAS of 15 cohorts (7763 cases and 4669 controls) and a previous GWAS of 26 cohorts (7,028 cases and 22,229 controls). In total, 14,791 cases and 26,898 controls from 41 cohorts were analyzed. Quality control (QC) was first performed per cohort to remove low-quality SNPs and individuals. After quality control, 12,577 cases and 23,475 controls were included (van Rheenen et al. 2016). For genotype imputation, prephasing was first performed for each stratum using SHAPEIT2 against the 1000 Genomes Project phase 1 haplotypes as a reference panel. Subsequently, strata were imputed up to the merged reference panel in 5-Mb chunks using IMPUTE2. An inverse-variance-weighted, fixed-effect meta-analysis was performed using METAL. The summary data of the meta-analysis were used in this study. Detailed information of cohorts, genotyping, imputation, meta-analysis, and quality control approaches can be found in the published study (van Rheenen et al. 2016).

eQTL Datasets

The eQTL dataset obtained from peripheral blood was used in this study (Westra et al. 2013). In brief, a genome-wide eQTL scanning was first conducted using 5311 individuals and replicated in another independent sample of 2775 subjects. Illumina whole-genome Expression BeadChips were used for mRNA expression profiling. SNP genotyping was conducted using commercial platforms, such as Illumina 610 K quad arrays and Illumina HumanHap300 arrays. Imputation was conducted using IMPUTE (Marchini et al. 2007) or MACH (Li et al. 2010) against the HapMap 2 reference panels. 923,021 cis-eQTLs and 4732 trans-eQTLs were identified (Westra et al. 2013).

SMR Single Gene Analysis of ALS

The GWAS summary data of ALS were analyzed by SMR for detecting associations between gene expression levels and ALS. SMR resembled a Mendelian randomization (MR) analysis, in which genetic variants were viewed as an instrumental variable to assess the effects of gene expression levels on disease phenotypes (Zhu et al. 2016). SMR analysis showed good power to evaluate the impact of gene expression variation on complex diseases by integrating GWAS summary data and eQTL annotation information (Zhu et al. 2016). 5366 genes with both GWAS summary data and eQTL data were analyzed in this study. The genome-wide significant genes were identified at P SMR < 0.05/5366 = 9.3 × 10−6. Suggestive association signals were identified at P SMR < 1×10−4. The heterogeneity in dependent instruments (HEIDI) tests were also conducted by SMR. The genes with P HEIDI > 0.05 were further subjected to GSEA analysis.

Pathway Enrichment Analysis of ALS

To reveal the biological significance of SMR single gene expression association analysis results for ALS, we extended SMR to pathway association analysis implemented by GSEA approach (Wang et al. 2007). 162 biological pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/) with known biological function were analyzed in this study. During the pathway enrichment analysis, 5000 permutations were conducted to calculate the empirical P value for each KEGG pathway.

Results

SMR Single Gene Expression Association Analysis

After strict Bonferroni correction, SMR identified one significant gene C9ORF72 (P SMR = 7.08 × 10−6, P HEIDI = 2.71 × 10−2) for ALS. We also identified four genes with suggestive association signals, including NT5C3L (P SMR = 1.33 × 10−5, P HEIDI = 8.88 × 10−2), GGNBP2 (P SMR = 1.81 × 10−5, P HEIDI = 7.23 × 10−2), ZNHIT3 (P SMR = 2.94 × 10−5, P HEIDI = 8.55 × 10−2), and KIAA1600 (P SMR = 9.97 × 10−5, P HEIDI = 9.90 × 10−2) (Table 1). The original GWAS of ALS identified three top genes associated with ALS (Table 2).

Table 1 List of significant genes identified by SMR
Table 2 List of top genes identified by the GWAS of ALS

Pathway Enrichments Analysis

Pathway enrichment analysis identified 7 significant pathways for ALS, including PEROXISOME (empirical P value = 0.006), CITRATE_CYCLE_TCA_CYCLE (empirical P value = 0.025), TIGHT_JUNCTION, PPAR_SIGNALING_PATHWAY (empirical P value = 0.025), SNARE_INTERACTIONS_IN_VESICULAR_TRANSPORT (empirical P value = 0.027), ARACHIDONIC_ACID_METABOLISM (empirical P value = 0.040), and GLYCOLYSIS_GLUCONEOGENESIS (empirical P value = 0.043).

Discussion

Extensive GWAS have been conducted and identified a large amount of genetic variants associated with complex diseases. But it is a challenge to reveal the biological significances of GWAS-identified loci, which locate outside known genes. Recent studies confirmed the implication of eQTLs in the development of complex diseases (Ertekin-Taner 2011; Gibson et al. 2012; Murphy et al. 2010; Nicolae et al. 2010). Integrating GWAS with eQTL studies has the potential to discover novel susceptibility genes for complex diseases. In this study, utilizing the latest SMR approach and published GWAS data, we conducted a genome-wide single gene association analysis and pathway enrichment analysis for ALS. We identified several genes and KEGG pathways associated with ALS, providing novel clues for the pathogenetic studies of ALS.

SMR analysis observed significant association between C9ORF72gene and ALS, confirming the important role of C9ORF72 in the pathogenesis of ALS. C9ORF72 plays an important role in the regulation of endosomal trafficking (Farg et al. 2014). Previous studies found that an expanded GGGGCC repeat within a non-coding region of C9ORF72 could increase the risk of ALS (Gijselinck et al. 2012; Mizielinska et al. 2014; Renton et al. 2011). Besides the well-known ALS-associated C9ORF72 gene, SMR also identified four genes with suggestive association signals for ALS, including NT5C3L, GGNBP2, ZNHIT3, and KIAA1600. The biological function of these four genes remains largely unknown by now. To the best of our knowledge, no study has been conducted to investigate the possible roles of these four genes in the development of ALS.

In the pathway enrichment analysis, we observed the most significant association between ALS and PEROXISOME pathways. Peroxisomes are essential organelles for redox signaling and lipid homeostasis, and involve in many crucial metabolic processes. PEROXISOME pathway plays a key role in the detoxification of reactive oxygen species (ROS) (Zhang et al. 2014). ROS have been suggested to contribute to the regulation of apoptosis, remarkably in inflammatory cells (Gu et al. 2017). Increased ROS production could induce apoptosis in endothelial cell and then initiate mitochondrial damage (Walford 2003). Previous studies have found that mitochondria dysfunction and overproduction of ROS played a crucial role in the pathogenesis of neurodegenerative diseases such as AD, PD, and ALS (Calabrese et al. 2005; Emerit et al. 2004; Federico et al. 2012; Zarkovic 2003).

Additionally, pathway enrichment analysis also identified several novel candidate pathways for ALS, such as GLYCOLYSIS_GLUCONEOGENESIS and ARACHIDONIC_ACID_METABOLISM. Previous studies have provided some evidence supporting the implication of identified candidate pathways in the development of ALS. For instance, Valbuena GN et al. observed abnormal aerobic glycolysis in ALS cellular model (Valbuena et al. 2016). Arachidonic acid is implicated in COX-2-driven inflammatory pathway in ALS (Kiaei et al. 2005). Further studies are warranted to reveal the roles of identified pathways in the pathogenesis of ALS.

The eQTL dataset from peripheral blood was used for SMR analysis in the study. This eQTL dataset should have good power for genes with consistent eQTL effects across tissues. But it may lose power for the genes with brain tissue-specific effects. Zhu et al. also evaluated the impact of tissue specificity on the performance of SMR approach. They compared the SMR results of schizophrenia using the eQTL datasets from brain and peripheral blood. They found that the SMR analysis results using peripheral blood eQTL dataset was highly consistent with that using brain eQTL dataset, suggesting the good performance of eQTL dataset from peripheral blood in SMR analysis (Zhu et al. 2016).

In summary, utilizing the latest published ALS GWAS and eQTL study data, we conducted a genome-wide eQTL-based single gene expression association analysis and pathway association analysis for ALS. We identified several genes and pathways associated with ALS. Our results provide novel clues for the pathogenetic studies of ALS. This study also illustrates the good performance of SMR approach and extends it to pathway association analysis for complex diseases.