Introduction

With an estimated 70,130 incident cases of non-Hodgkin lymphoma (NHL) and 18,940 deaths due to disease and an additional 16,060 incident chronic lymphocytic leukemia (CLL) cases and 4,580 death due to disease in the US in 2012 [1], there is a continuing need to identify factors that influence NHL risk and survival. Lymphomas classified as NHL are distinct diseases with different biology, clinical courses, and treatment but appear to have some shared risk factors [2]. Therefore, a better understanding of shared and subtype-specific NHL risk factors should provide insights into the mechanisms of lymphomagenesis.

CXCR5, formerly known as Burkitt lymphoma receptor 1 (BLR1), is the receptor for CXCL13 and plays a role in B-cell and follicular T helper cell migration. Gene knock-out studies in mice have also shown this receptor is important for normal development and organization of follicular zones in lymph nodes and Peyer patches [3, 4]. Expression of this receptor has been observed in a number of malignancies, including several types of lymphoma [58]. Recently, Song et al. [9] reported an association of several CXCR5 SNPs with NHL risk in a Chinese population. Additionally, genome-wide association studies (GWAS) have identified polymorphisms in CXCR5 linked to the autoimmune diseases biliary cirrhosis and multiple sclerosis [10, 11]. We therefore conducted the first study in a Western population of common germline genetic variation in CXCR5 in relation to risk of NHL and its most common individual diseases (subtypes): follicular lymphoma (FL), diffuse large B-cell lymphoma (DLBCL), marginal zone lymphoma (MZL), mantle cell lymphoma (MCL), peripheral T-cell lymphoma (PTCL), and chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL). We also assessed these variants in relation to prognosis of some of the more common subtypes.

Materials and methods

Study population

The Human Subjects Institutional Review Board at the Mayo Clinic and the University of Iowa reviewed and approved the current study. As previously described, the Mayo Clinic Case–Control Study of Lymphoma is a clinic-based study of incident cases and frequency-matched controls (based on age, sex, and residence) [12]. Briefly, cases were lymphoma patients seen at Mayo Clinic Rochester and enrolled within 9 months of diagnosis, who were aged 18 years or older; a resident of Minnesota, Iowa, or Wisconsin; and HIV negative at the time of diagnosis. Diagnoses were confirmed by study hematopathologists and classified according to the recent World Health Organization (WHO) classification of hematopoietic neoplasms [13, 14]. For composite or discordant histologies, the lower-grade component was used to classify the subtype for analysis of risk (assumed to have existed first), while the higher-grade component was used to classify the subtype for analysis of outcome (since that would determine treatment strategy).

Controls were ascertained from patients visiting the general medicine clinics at Mayo Clinic Rochester for pre-scheduled general medical examinations and were frequency matched to the cases on age, sex, and residence. Control patients were excluded if they had prior diagnoses of lymphoma, leukemia, or HIV infection.

We also included newly diagnosed lymphoma patients from the University of Iowa and Mayo Clinic Lymphoma Specialized Program of Research Excellence (SPORE); these patients had the same eligibility requirements as the other cases, except that they could be a resident of any US state and there was no matching control group.

All participants completed questionnaires, and a peripheral blood sample was collected. DNA was extracted from peripheral blood sample using a standard procedure (Gentra Inc., Minneapolis, MN, USA). This analysis included 2,694 NHL (including CLL) cases and 1,522 controls enrolled from 9/1/2002 through 12/31/2009.

For all case patients, clinical, laboratory, and treatment data at the time of diagnosis and initial treatment were abstracted. Cell of origin for germinal center B-cell-like (GCB) and non-GCB subtypes was defined according to the Hans algorithm [15]. All patients were then actively followed every 6 months for the first 3 years, and then annually thereafter. We verified all reports of disease progression, retreatment, transformation, and death against medical records [16].

For the transformation analysis, we only included grade I-IIIa FL with no evidence of clinical or pathological transformation at the time of initial diagnosis. Transformation was defined as refractory/recurrent disease with either clinical or pathological diagnosis of lymphoma, as previously described [17]. Pathology-defined transformation entailed a biopsy-confirmed subtype of FLIIIb, DLBCL, unclassifiable B-cell lymphoma with features intermediate between diffuse large B-cell lymphoma and Burkitt lymphoma, and high-grade B-cell lymphoma (including Burkitt), or evidence of transformation per pathologist report [18]. Clinical transformation was defined using previously published [19] clinical indication of transformation (sudden rise in LDH, rapid discordant localized nodal growth, new involvement of unusual extranodal sites, new B symptoms, or hypercalcemia) or a statement in the medical record that the treating physician was clinically managing the patient as a transformation at the time of recurrence.

Genotyping

A total of ten CXCR5 tag SNPs were successfully genotyped as part of a larger genotyping project using a custom Illumina Infinium iSelect array (Illumina, San Diego CA). Tag SNPs were selected from Hapmap (Phase 2, release 23a, CEU population) using a standard tagging approach (r 2 ≥ 0.80 and minor allele frequency (MAF) ≥0.05) and included SNPs 5 kb upstream and downstream from the gene. Standard genotyping quality control procedures were performed and included duplicate genotyping, dropping samples or SNPs with call rates <95 %, and testing for Hardy–Weinberg equilibrium (HWE). We found >99.99 % genotyping concordance for 406 SNPs previously genotyped on 3,116 participants using an Illumina GoldenGate platform and for 2,477 SNPs on 386 participants using the Illumina 660 platform. All of the CXCR5 variants had HWE p > 0.1 and SNP call rates >99.8 %.

Statistical analysis

Main analyses used SAS version 9.2 (SAS Institute Inc., Cary, NC, USA) and R (http://www.r-project.org/). Tests for HWE were done using either the Pearson’s goodness-of-fit test or the Fisher’s exact test, where appropriate. All analyses excluded known non-Caucasians to minimize bias due to population stratification; participants who refused or had missing race/ethnicity were retained in the dataset (presumed Caucasian, given our source population).

For evaluation of risk, we used unconditional logistic regression to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the association of NHL case status (overall NHL and for each common NHL subtype) with each SNP. Analyses were adjusted for age (including its functional form) and gender, and the most common homozygous genotype was treated as the referent category for each of the SNPs. Each SNP was modeled in a log-additive manner in the regression model, and the Wald p value was used to assess significance. To account for multiple testing of 10 independent SNPs (r 2 < 0.6 for all pairs), we used the Bonferroni-corrected p value of 0.005 (0.05/10) as our significance threshold.

For evaluation of prognosis, we used Cox proportional hazards regression to estimate hazard ratios (HRs) and 95 % CI. We defined event-free survival (EFS) as the time from diagnosis to disease progression, retreatment, or death due to any cause and overall survival (OS) as time from diagnosis to death due to any cause. Patients without an event were censored at time of last known follow-up. All survival analyses included adjustment for subtype-specific clinical risk score (i.e., IPI [20], FLIPI [21], MIPI [22], etc.) as appropriate. For evaluation of FL transformation, we used Cox proportional hazards regression with the inclusion of death without transformation as a competing risk.

Bioinformatics

All SNPs in LD (at r 2 ≥ 0.5) with the 10 CXCR5 SNPs in this study were extracted from the 1,000 genomes database. First, all SNPs were annotated using the UCSC tracks downloaded from the UCSC golden path database (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/). Specifically, we looked for SNPs that are located within a predicted conserved element using the PhastCons program, where the predictions are based on a phylogenetic hidden Markov model (compgen.bscb.cornell.edu/phast). The conserved SNPs were then overlapped with other functional tracks including CpG Islands, transcription-factor-binding sites, enhancers, TargetScan-microRNA-binding sites, and Encode epigenetic markers. We also assessed SNPs in CXCR5 for potential location in microRNA (miRNA)-binding sites using the PolymiRTS Database 2.0 [23] (http://compbio.uthsc.edu/miRSNP/). Additional information on correlated variants and their potential function was obtained from HaploReg v2 (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) [24].

Results

CXCR5 and risk of NHL and NHL subtypes

The mean age of the 2,694 NHL patients was 61.6 years (SD = 13.3) and the 1,521 controls was 61.6 years (SD = 13.7); 41 % of cases were female compared to 49 % of the controls. Figure 1 contains an LD block of SNPs assessed for the CXCR5 gene region based on allele frequencies in our control population. Only two pairs of SNPs were in modest LD with each other: rs11217078 with rs523604 (r 2 = 0.53) and rs497916 with rs3922 (r 2 = 0.51). All other SNP pairs had r 2 < 0.40. Total bin coverage for CXCR5 was 91 %.

Fig. 1
figure 1

Chromosome 11q23.3 gene CXCR5 structure and tagSNP mapping. Top −Log10 p value for trend across tagSNPs for all NHL risk (dark circles) and FL (open circles). Bottom linkage disequilibrium plot of SNPs genotyped in this analysis (darker shading indicates higher r 2 correlation values between SNPs; numbers are |D’| values)

Of the ten CXCR5 tag SNPs, two were statistically significantly associated with overall NHL risk at the Bonferroni-corrected threshold of p < 0.005, all with small effect sizes (Table 1): rs1790192 (OR 1.19, 95 % CI 1.08–1.30; p = 0.0003) and rs497916 (OR 1.16, 95 % CI 1.05–1.28; p = 0.003).

Table 1 Risk of NHL for selected SNPs in CXCR5 (11q23.3)

FL- and DLBCL-specific risk associations are presented in Table 2, with stronger associations and more SNPs related to risk of FL and the GCB subtype of DLBCL. The most prominent association was observed for rs1790192 with FL risk (OR 1.44, 95 % CI 1.25–1.66; p = 3.1 × 10−7). This SNP was also associated with DLBCL risk (OR 1.16, 95 % CI 1.01–1.33; p = 0.04), particularly GCB DLBCL subtype (OR 1.36, 95 % CI 1.05–1.77; p = 0.02), although neither association was statistically significant after accounting for multiple testing. The SNP rs497916 was also associated with both FL (OR 1.32, 95 % CI 1.14–1.53; p = 0.0002) and GCB DLBCL (OR 1.50, 95 % CI 1.15–1.97; p = 0.003) risk. The variant rs630923 was only statistically significantly associated with risk of FL (OR 1.30, 95 % CI 1.09–1.55; p = 0.003) but not DLBCL, while rs3922 was associated with risk of GCB DLBCL (OR 1.46, 95 % CI 1.12–1.89; p = 0.005) but not non-GCB DLBCL or FL.

Table 2 Risk of FL and DLBCL for selected SNPs in CXCR5 (11q23.3)

Associations between several other NHL subtypes and SNPs in CXCR5 are reported in Table 3. CLL risk was only statistically significantly associated with rs11217078 (OR 1.23, 95 % CI 1.07–1.40; p = 0.003) after accounting for multiple testing; additional associations were observed for CLL risk with rs630923 (OR 1.22, 95 % CI 1.03–1.45; p = 0.02), and rs523604 (OR 1.20, 95 % CI 1.05–1.36; p = 0.006) but did meet the p < 0.005 criteria. For the other subtypes, rs1623316 (OR 0.80, 95 % CI 0.65–0.99; p = 0.04) was associated with MZL risk, and PTCL was associated with rs10892306 (OR 1.57, 95 % CI 1.04–2.39; p = 0.03), rs1790192 (OR 1.29, 95 % CI 1.02–1.64; p = 0.04), and rs12363277 (OR 0.49, 95 % CI 0.27–0.91; p = 0.02), although none met the multiple testing threshold.

Table 3 Risk of other NHL subtypes for selected SNPs in CXCR5 (11q23.3)

CXCR5 and NHL prognosis

Prognosis was evaluated for some of the most common NHL subtypes including DLBCL (N = 581), FL (N = 548), MCL (N = 140), and PTCL (N = 109); numbers were slightly different from the risk analysis based on subtype classification of composite or discordant histologies (see “Materials and methods” section) or availability of clinical data. Clinical and treatment characteristics for DLBCL and FL are provided in Supplemental Tables 1 and 2, respectively. We did not assess CLL survival in this paper, due to few deaths for an analysis of overall survival and a lack of an analogous event for EFS endpoint.

Results for DLBCL OS and EFS (all and by GCB/non-GCB subtype) restricted to patients who received R-CHOP treatment are presented in Supplemental Table 3. For DLBCL, no associations were found between CXCR5 SNPs, and EFS, overall or for GCB and non-GCB subtypes. However, in DLBCL patients who received R-CHOP, there was a suggestive association between OS and rs12363277 (HR 0.54, 95 % CI 0.30–0.98; p = 0.04) and rs3922 (HR 0.78, 95 % CI 0.62–0.99; p = 0.04), although not statistically significant at the Bonferroni corrected p < 0.005. When assessing all FL cases, the CXCR5 SNPs were not associated with EFS or OS (Supplemental Table 4). However, when cases were stratified by treatment groups, the number of minor alleles of rs1790192 was associated with improved EFS in patients who were initially observed instead of receiving treatment (HR 0.64, 95 % CI 0.47–0.87; p = 0.004) (Supplemental Table 4 and Fig. 2). Additionally, there was an inverse association between the number of rs3922 minor alleles and transformation of FL to DLBCL (HR 0.62, 95 % CI 0.41–0.93; p = 0.02), although this did not meet the Bonferroni threshold. None of the CXCR5 variants assessed were associated with EFS or OS in MCL (Supplemental Table 5) or PTCL (Supplemental Table 6).

Fig. 2
figure 2

rs1790192-dependent EFS in patients with FL who were observed as initial disease management. Percent event-free survival in observed FL patients (N = 172) (event defined as disease progression, initiation of treatment, or death due to any cause) according to rs1790192 genotype. Hazard ratio was calculated using Cox proportional hazards regression adjusted for FLIPI

Discussion

Polymorphisms in CXCR5 were associated with risk of several NHL subtypes after accounting for multiple testing, with the most notable being an increased risk of FL associated with the minor allele of rs1790192. The number of minor alleles of this SNP was also modestly associated with increased risk of overall DLBCL, and the association was stronger for GCB DLBCL. The latter finding is consistent with the biology of GCB DLBCL and FL, which arise from germinal center B-cells, and a subset of GCB DLBCL share the same t(14;18) translocation as most FLs [25]. On the other hand, the number of minor alleles of rs1790192 was statistically significantly associated with superior EFS in FL, and transformation of FL to DLBCL was modestly inversely associated with the number of minor alleles of rs3922, suggesting biologic differences of this SNP in etiology versus disease progression. We are the first to our knowledge to assess CXCR5 SNPs in association with survival in DLBCL, FL, MCL, and PTCL.

The role of CXCR5 in lymph node development and B-cell recruitment [26] provides a plausible biological role for this chemokine in lymphoma development and progression. This receptor, normally expressed on T follicular helper cells and B-cells, interacts with chemokine CXCL13 with functions including inducing B-cell migration [3, 4] and augmenting B-cell activation during BCR signaling [27]. In at least two HIV+ patient populations, serum CXCL13 levels were higher in HIV-infected patients who developed NHL than those who did not [28, 29]. This finding was also confirmed in an immunocompetent population in the Women’s Health Initiative Observational Study cohort, where elevated serum CXCL13 was significantly associated with increased risk of B-NHL [30]. Additionally, Burkle et al. [31] previously reported that CLL cells expressed high levels of CXCR5, serum CXCL13 was elevated in CLL patients, and CLL B-cells exposed to CXCL13 exhibited prolonged p44/42 MAPK activation, considered to provide a pro-survival signal [32]. Together, these studies provide support for CXCL13/CXCR5 involvement in NHL development and progression.

While SNPs in CXCR5 have not been reported extensively in association with disease states, one recent GWAS found an association of CXCR5 rs6421571 with risk of primary biliary cirrhosis, an autoimmune disease of the liver [10]. Also of potential interest is the rs630923 SNP, as increased copies of the A allele were found in our study to be modestly associated with increased risk of CLL and FL. The A allele of this SNP was also recently reported to be associated with increased CXCL13 levels in serums of cases and controls in an HIV+ NHL case–control study [29]. It should be noted that neither this SNP nor the other CXCR5 tag SNPs in this study were associated with AIDS-NHL; however, the minor allele of CXCL13 tag SNP, rs355689, was associated with both reduced AIDS-NHL risk and reduced CXCL13 serum levels [29]. The C allele of the rs630923 SNP was also found in a recent GWAS to be associated with increased risk of multiple sclerosis (MS) [11]. One study, with extremely small numbers, found an association between history of MS or first-degree relative with MS and NHL risk [33]. An increased risk of NHL, particularly FL and DLBCL, was also found for combined autoimmune conditions, which included MS, in a population-based case–control study of women [34]. However, in a large pooled analysis of 12 case–control studies within the InterLymph Consortium, there was no association between MS and NHL risk [35], and in a large cohort of US veterans, an inverse association was found between MS diagnosis and NHL risk [36]. While the association between MS and NHL is controversial, given the role of B-cells in MS [37], it would not be surprising that a common CXCR5-related mechanism drives formation of both diseases.

In a recent case–control study in a Chinese population, including 404 NHL cases and 456 age-matched controls, four CXCR5 polymorphisms were assessed for association with NHL [9]. The authors reported an association between rs6421571 and overall NHL risk, Ann Arbor stage, and B-cell subtype. Additionally, they found rs80202369 and the TGG haplotype of rs6421571, rs80202369, and rs78440425 were associated with overall NHL risk [9]. It should be noted these particular SNPs were not included in our CXCR5 tag SNPs, nor were they in LD with any of our tag SNPs at r 2 > 0.2 using information from the 1,000 genomes European population. In this representative Caucasian population, rs80202369 was monomorphic, and rs6421571 and rs78440425 had MAFs of 0.178 and 0.007, respectively. While we did not assess these specific SNPs, the association we found between overall NHL and several SNPs in the CXCR5 gene provides additional support for a potential role of CXCR5 in lymphomagenesis.

The finding that the minor allele of rs1790192 is associated with elevated risk but improved survival suggests that the role of CXCR5 differs between lymphoma initiation and progression. Based on the limited available data, we can only speculate that the major allele (associated with reduced risk and poor prognosis) results in higher expression of CXCR5 compared to the minor allele. Because CXCR5 is associated with immune cell recruitment, it is possible that higher expression of this chemokine in the early stages of lymphomagenesis could enhance immune surveillance and thus provide protection/reduced risk [38]. However, if premalignant cells manage to escape immune surveillance, proliferate and establish themselves as a tumor, immune cells recruited to the tumor microenvironment may be induced to develop a suppressive phenotype, which could in turn enhance tumor growth and progression accounting for reduced survival in these patients [38]. This hypothesis will need to be tested in future studies.

A general search of the UCSC bioinformatics tools and the PolymiRTS Database 2.0 found little evidence for function of any of the NHL risk or survival associated CXCR5 SNPs or SNPs with which they were in moderate-to-high (r 2 ≥ 0.50) LD in the 1,000 genomes population, with a few exceptions. The rs3922 polymorphism is found in a binding site for three known miRNAs. However, it is only conserved across 3 other species, so there is lower likelihood of a functional consequence. This SNP is also in moderate LD with rs598207 (r 2 = 0.52) and high LD with rs1704819 (r 2 = 0.99), both synonymous SNPs in the coding region of CXCR5. While originally thought to be neutral, synonymous SNPs can alter a number of functions, such as mRNA secondary structures (related to stability), splicing, and normal co-translational folding [39]. We further examined rs1790192 using HaploReg (http://www.broadinstitute.org/mammals/haploreg/haploreg.php).  According to HaploReg, rs1790192 lies in a region of 33 regulatory motifs across a number of different cell types.  Based on histone marks, in GM12878 lymphoblastoid cell lines this SNP falls in a poised promoter region and in K562 leukemia cell line this SNP falls in a weak enhancer region.  We note that this does not mean that the altered base pair at this location would alter function or expression of the gene product, only that its location in regulatory regions makes this a possibility.  It should also be noted that this SNP is not located in an evolutionarily conserved domain.  We also note that these tools assess predicted function, and laboratory studies are needed to assess actual consequences of these polymorphisms on gene products. Finally, CXCR5 overlaps the BCL9L gene on chromosome 11, which encodes a protein with high-sequence homology to BCL9 [40], discovered in lymphoblastic leukemia cells with a t(1;14)(q21;q32) translocation [41]. BCL9L was shown recently to be related to progression of intestinal tumors [42], and we cannot rule out a potential role for these SNPs on BCL9L expression.

Strengths of our study include a central pathology review of case phenotype; a representative control group; the large sample size and ability to evaluate common NHL subtypes, as well as GCB and non-GCB DLBCL; assessment of risk and prognosis in the same population; and relatively detailed clinical and outcome data. There were also limitations, including lack of racial and ethnic diversity, which limits generalizability. Additionally, while the bin coverage for CXCR5 in this study was quite high, we cannot rule out additional associations with SNPs that were not covered by our tagging approach.

In this study, we found that several CXCR5 polymorphisms were related to risk of NHL subtypes and prognosis in FL. The most noteworthy SNP was rs1790192, which was strongly associated with increased risk of FL, while also being associated with superior survival in FL patients who were observed for initial disease management. Future functional assessment of these SNPs could provide useful insights into how CXCR5 might be regulating tumor promotion and progression.