Introduction

Gliomas account for approximately 80 % of all primary malignant brain tumors (Kohler et al. 2011) and, despite improvements in clinical care over the last 20 years, remain associated with considerable morbidity, with the most common histological subtype, glioblastoma (GBM) having a median survival of only 15 months (CBTRUS 2012). To date, the only established environmental risk factor is exposure to moderate-to-high doses of ionizing radiation (Bondy et al. 2008). A heritable component of glioma is supported by: the twofold elevated risk in individuals with a positive family history (Hemminki et al. 2009; Malmer et al. 2003; Scheurer et al. 2007; Wrensch et al. 1997); an increased risk observed in rare genetic syndromes (Farrell and Plotkin 2007); a possible moderately penetrant risk locus in the 3′ untranslated region of TP53 (Stacey et al. 2011); and recent identification by genome-wide association studies (GWAS) of common susceptibility variants at 5p15.33 (TERT), 8q24.21 (CCDC26), 9p21.3 (CDKN2A-CDKN2B), 20q13.33 (RTEL1), 11q23.3 (PHLDB1), and two independent signals at 7p11.2 (EGFR) (Sanson et al. 2011; Shete et al. 2009; Wrensch et al. 2009).

To search for additional common genetic variants, we conducted a new independent GWAS in 1,856 cases and 4,955 controls ascertained from 14 cohort studies, 3 case–control studies, and 1 population-based case-only study (Table 1). Previous GWAS studies were based on case–control samples only. Our study was designed to include a large number of incident cases from cohort studies (556 out of 1,856, i.e., 30 % of all cases) to minimize potential bias to glioma with longer survival.

Table 1 Studies included in the GliomaScan genome-wide association study (GWAS) and replication meta-analysis

Results

Study-specific population characteristics are summarized in Table 1. The mean age of cases ranged from 48.7 years in the NIOSH Upper Midwest Health Study to 73.5 years in the Multi-Ethnic Cohort. 55.1 % of glioma cases were of the glioblastoma subtype, with a larger percentage of high-grade tumors (WHO III or IV) observed in the cohort (74.7 %) versus case–control (64.5 %) studies (Supplementary Table 1).

After quality control metrics were applied to the scan data, 559,977 SNPs were available for analysis in 1,856 cases and 4,955 controls (details in “Materials and methods”). Concordance between known duplicates was greater than 99.95 %. The main effect model was adjusted by sex, age, study, and seven eigenvectors (to account for small differences in population substructure). Examination of the Q–Q plot indicated the likelihood of additional loci associated with glioma risk (Fig. 1). The genomic control lambda for the study is estimated at 1.006, suggesting the lack of issues related to differences in the underlying population substructure.

Fig. 1
figure 1

Quantile-Quantile (Q–Q) plot of observed versus expected P values in the Glioma GWAS The analysis was adjusted by sex, age, study, and seven eigenvectors. The genomic control lambda is 1.006

The results of this genome-wide association scan confirmed the previously reported seven regions as risk susceptibility loci for glioma (Fig. 2). Specifically, we replicated three of seven previously reported associations at 20q13.33 (RTEL), 5p15.33 (TERT), and 9p21.3 (CDKN2BAS) (Table 2). Associations for the remaining loci were consistent with reported findings with respect to the direction of the odds ratios, but were not statistically significant at the genome-wide level (i.e., p < 5.0 × 10−8). When results were examined separately for samples from the cohort versus case–control studies, the direction and magnitude of the signal were generally consistent. However, the strength of the association was more pronounced for loci rs6010620 (20q,13.33; RTEL) and rs2736100 (5p15.33, TERT) in the cohort studies despite the smaller number of cases in this group (Table 3). Conversely, the strength of the association for loci at 11q23.3 (PHLDB1) and 9p21.3 (CDKN2BAS) was higher in case–control studies.

Fig. 2
figure 2

Manhattan plot of the association results. Chromosomal locations of p values derived from 1-df trend tests from logistic regression model adjusted for study sites, age, gender, and seven eigenvectors on 1,856 cases and 4,955 controls

Table 2 Risk estimates for Glioma for previously reported Glioma GWAS signals
Table 3 Risk estimates for Glioma for previously reported Glioma GWAS signals, separately for cohort studies and case–control studies

We further examined associations for previously reported loci by gender and tumor subtype (Tables 4, 5). In analyses by gender, the signals at 8q24.21 (CCDC26) rs4295627 and 7p11.2 (EGFR) rs2252586 were stronger in women compared with men in our data (p value for heterogeneity 0.0037 and 0.057, respectively). However, this effect modification by gender was not observed in the joint data from the UK, US-MDA, French, and German replication groups. By tumor subtypes, the three regions most strongly associated with glioma risk overall at 5p15.33 (TERT), 9p21.3 (CDKN2B), and 20q13.33 (RTEL1) were mainly associated with glioblastoma. Associations with the marker at 8q24.21 (CCDC26) appeared more pronounced for oligodendroglioma, while the signal at 11q23.3 (PHLDB1) was preferentially associated with low-grade glioma.

Table 4 Risk estimates for Glioma for previously reported Glioma GWAS signals by gender
Table 5 Risk estimates for Glioma for previously reported Glioma GWAS signals by tumor subtype

In addition to previously reported loci, we identified 85 previously unreported loci with associations of p trend ≤4.0 × 10−4 after removing probable genotyping artifacts, known associations, and highly correlated SNP markers (r 2 > 0.6). We performed an in silico replication by a meta-analysis with data from three previously reported GWAS studies which provided data on a total of 5,015 cases and 11,601 controls (Table 1) (Sanson et al. 2011; Shete et al. 2009; Wrensch et al. 2009). Summary measures (odds ratios and 95 % confidence intervals) were obtained from each study and a meta-analysis was performed using an inverse variance fixed effect model. However, none of these associations reached statistical significance at the genome-wide association level (Supplementary Table 2). A similar exercise was undertaken for 85 promising loci identified in combined data from the UK, US-MDA, French, and German replication groups, but again, none of these associations reached statistical significance at p < 5.0 × 10−8 (Supplementary Table 3).

Discussion

In this study, we present the data from a new independent GWAS of glioma based on 1,856 cases and 4,955 controls. While we did not observe any novel locus that reached genome-wide significance, the new scan provided further evidence for confirmation of the established loci. Similar to previously published reports, we note that TERT rs2736100, CDKN2B rs4977756, and RTEL1 rs6010620 were most strongly associated with glioblastoma, CCDC26 rs4295627 with oligodendroglioma, and PHLDB1 rs498872 with low-grade glioma (Egan et al. 2011; Jenkins et al. 2011; Simon et al. 2010). These results suggest different genetic etiologies for different subtypes of glioma and underscore the importance of considering tumor heterogeneity in GWAS studies.

Although we observed differential associations for the two loci on 8q24.21 (CCDC26) and 7p11.2 (EGFR) by gender in our data, effect modification by gender was not observed for these loci in the joint data from the UK, US-MDA, French, and German replication groups, suggesting that the observed gender differences in our data could have been due to chance. However, it will be important to re-examine potential effect modification by gender in larger datasets, along with consideration of potential risk covariates of interest such as allergy or smoking (Lachance et al. 2011; Schoemaker et al. 2010).

Previous GWAS of glioma were based on case–control studies only, which would generally not include rapidly fatal gliomas. One concern of results from these studies is that associations may be influenced by survival and therefore potentially bias toward glioma with longer survival. It is noteworthy that in our GWAS scan, the strength of the association was more pronounced for rs6010620 (20q,13.33; RTEL) and rs2736100 (5p15.33, TERT) in the cohort studies despite the smaller number of cases in this group. These regions have been particularly associated with high-grade glioma in other studies (Egan et al. 2011; Jenkins et al. 2011; Simon et al. 2010), and the differences in cohort versus case–control results in our scan likely reflects the fact that a higher proportion of highly fatal tumors (WHO Grade III and IV) were captured by the cohort studies as compared to the case–control studies. Similarly, stronger results for the CCDC26 and PHLDB1 variants in the case–control studies are consistent with previous associations of these loci with low-grade tumors. Nonetheless, the overall results from GliomaScan, which comprised a large number of incident gliomas from cohort studies, support GWAS associations based on previous case–control studies. Our data thus suggest that previously reported associations are generalizable to incident glioma cases.

Our study had adequate power to detect variants of moderate effect sizes for common allele frequencies. However, we did not observe additional signals with in silico analysis in three previously reported scans totaling 5,015 cases and 11,601 controls. This suggests that the underlying architecture of genetic susceptibility to glioma may not include as large a proportion of common variants as has been seen for other cancers to date. Alternatively, the underlying heterogeneity of glioma may limit our ability to identify more highly significant variants. For example, recent advances in understanding of glioma subtypes (e.g., proneural, neural, mesenchymal) based on gene expression ( Cancer Genome Atlas Research Network 2008; Phillips et al. 2006), somatic mutations (e.g., IDH1) (Yan et al. 2009), and global patterns of methylation (glioma CpG island methylator pheynotype; G-CIMP) (Noushmehr et al. 2010) suggest that there are important subgroups of glioma which may represent distinct pathological entities. Still, given the relatively small sizes of the glioma scans to date, and in order to comprehensively define the catalog of common variants associated with risk for glioma (Park et al. 2010), further genome-wide association studies will need to involve sufficiently large study populations along with analysis of tumor subtypes to assess these risks.

Materials and methods

Study participants

Studies participating in GliomaScan are described in Table 1 and comprise 1,856 glioma cases and 4,955 controls from 14 cohort studies, 3 case–control studies, and 1 community-based case-only study. Cases were newly diagnosed glioma (ICDO-3 codes 9380-9480 or equivalent), and controls were cancer free at the time of glioma diagnosis. Cases and 2,429 newly genotyped controls (pre-QC) were scanned with the Illumina 660 W chip. Newly genotyped controls for this project were selected in a 2:1 ratio, frequency matched on age, sex and race/ethnicity. GWAS data were already available on 2,591 controls and 12 cases from cohorts that had participated in the PANSCAN study (pancreatic cancer GWAS), CGEMS studies (Hunter et al. 2007; Landi et al. 2009; Yeager et al. 2007), and the NCI lung cancer GWAS (Landi et al. 2009). These were scanned with the commercial HumanHap 550 or HumanHap 610 Illumina SNP arrays.

Study design

We conducted a new genome-wide association scan of glioma (GliomaScan) to validate previously reported risk regions and to attempt to identify additional novel risk loci. Details of the 19 studies participating in GliomaScan are provided in Table 1. We evaluated 85 additional loci of potential interest by conducting a fixed-effects meta-analysis using in silico data from three previously reported genome-wide association scans in a total of 5,015 cases and 11,601 controls (Sanson et al. 2011; Shete et al. 2009; Wrensch et al. 2009).

Genome-wide SNP genotyping

All GliomaScan samples were genotyped at the NCI Core Genotyping Facility (CGF, Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute, Bethesda, USA). Samples from the UK, MD Anderson Cancer Center, France, and Germany were genotyped as described previously (Sanson et al. 2011; Shete et al. 2009; Wrensch et al. 2009). Summary estimates were provided from previously genotyped studies for the purpose of meta-analysis.

Quality control assessment

Genotyping was attempted for a total of 5,084 GliomaScan samples on Illumina 660 W arrays at the CGF. After excluding 6 samples due to laboratory processing error, 5,078 samples remained (2,215 cases, 2,859 new controls and 4 QC samples). Genotype clusters were estimated with high performing samples having overall completion rates greater than 98 %, and genotype calls for the rest of the samples were based on the clusters defined by the high performing samples only. Additionally, 2,591 previously scanned (on 550 or 610 chips) controls and 12 previously scanned individual cases from ATBC, CLUE, CPSII, HPFS, NHS, NYUWHS, PHS, PLCO, SMWHS, and WHS were included.

SNP assays were excluded if they had less than 90 % of completion rate, or had extreme deviation from fitness for Hardy–Weinberg proportion (p < 1 × 10−10). Participants were excluded based on: (1) completion rates lower than 94–96 % as per the QC groups (n = 420 samples); (2) abnormal heterozygosity values of less than 25 % or greater than 35 % (n = 45)—some samples were excluded for both low completion rates and abnormal heterozygosity, and the total number of unique samples excluded for either criteria was 438; (3) unexpected duplicates (n = 8 forming 4 pairs) and one sample that also failed due to low completion rate; (4) sex discordance between self-reported and the imputed gender by X chromosome heterozygosity (n = 9); (5) one sample from each unexpected inter-study duplicates (n = 20); and (6) phenotype exclusions (due to ineligibility or incomplete information) (n = 27). Utilizing a set of 12,000 unlinked SNPs (pair-wise r 2 < 0.004) common to all GWAS chips (Yu et al. 2008), 215 subjects with less than 80 % European ancestry were excluded from downstream analyses based on STRUCTURE analysis (Falush et al. 2007) and PCA (Price et al. 2006). For the planned 154 duplicate pairs, concordance was 99.96 %.

The final participant count for the association analysis was 1,856 cases and 4,955 controls. A total of 559,977 SNPs were available for analysis in one or more studies. Each participating study obtained informed consent from study participants and approval from its institutional review board (IRB) for this study and obtained IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS). The dbGaP data portal provides access to individual-level data from the NCI scan ONLY to investigators from certified scientific institutions after approval of their submitted Data Access Request.

Statistical analysis

The association between the 559,977 SNPs and risk of glioma was estimated by the odds ratio (OR) and 95 % confidence interval (CI) using unconditional logistic regression assuming a trend effect genetic model with 1 degree of freedom. PCA analysis revealed seven significant (p < 0.05) eigenvectors when included in the NULL model (logistic regression with dummy variables for sex, age, and study). The main effect model was adjusted by sex, age, study, and seven eigenvectors. In addition to overall analyses of SNP associations, models were also examined by gender and stratified by the following tumor subtypes: glioblastoma (ICDO-3 codes 9440, 9441, 9442, 9443), oligodendroglioma/mixed glioma (ICDO-3 codes 9382, 9450, 9451, 9460), low-grade glioma (grade I or II according to current WHO classifications), or high-grade glioma (grade III or IV according to current WHO classifications) (ICD-O 2000; Louis et al. 2007). Top-ranked SNPs for further follow-up were selected based on the p value for additive trend, after known hits and loci in high linkage disequilibrium (pairwise r 2 value > 0.6) were removed.

Meta-analysis

For the 85 loci of interest, each participating center provided the results of logistic regression analysis for individuals of European ancestry (CEU) adjusted for age and study-specific factors (e.g., study site). The following information was provided for each SNP: minor allele frequency (MAF), genotype counts for both cases and controls, risk allele, per allele odds ratio (OR), associated 95 % confidence intervals, and the associated p value of 1 degree freedom (df) test of the trend effect for the SNP. Summary estimates for each center were combined using a fixed-effect meta-analysis.

Data analysis

Data analysis and management were performed with GLU (Genotyping Library and Utilities version 1.0), PLINK and SAS® version 9.2 (Raleigh, NC, USA).

URLs

CGEMS portal: http://cgems.cancer.gov/

CGF: http://cgf.nci.nih.gov/

GLU: http://code.google.com/p/glu-genetics/

EIGENSTRAT: http://genepath.med.harvard.edu/~reich/EIGENSTRAT.htm

STRUCTURE: http://pritch.bsd.uchicago.edu/structure.html

PLINK: http://pngu.mgh.harvard.edu/~purcell/plink/

SAS: http://www.sas.com/