Frequency and risk factors of urinary bladder cancer

Urinary bladder cancer (UBC) is the 9th most common cancer worldwide, and the 13th most common cause for death from cancer (Parkin 2008). After removal of primary carcinomas, UBC frequently recurs leading to repeated surgery. The strongest known risk factors are cigarette smoking, occupational exposure to bladder carcinogens, particularly to aromatic amines and polycyclic aromatic hydrocarbons, and male gender. Recent substances drawing interest include azo colourants (Golka et al. 2004) and hair dyes (Bolt and Golka 2007).

Single nucleotide polymorphisms and SNP chip analysis

It is well known that single nucleotide polymorphisms contribute to interindividual differences in cancer susceptibility (Kiemeney et al. 2008, 2010; Hengstler et al. 1998; Arand et al. 1996; Saravana Devi et al. 2008; Hewitt et al. 2007; Gehrmann et al. 2008; Carmo et al. 2006; Cadenas et al. 2010; Hellwig et al. 2010). DNA from different individuals is identical for most base positions; however, variants are observed approximately every 500 bases. If a variant occurs in more than 1% of the population, it is defined as a single nucleotide polymorphism (SNP; Fig. 1a). Today, up to 900,000 SNPs and 900,000 copy number variations can be determined in a single analysis on SNP chips. This technique is based on hybridization of DNA from patients to oligonucleotides immobilized on a chip, which contains oligonucleotides with the two alleles on different spots (Fig. 1b). The patients’ DNA (after digestion with endonucleases) is labelled with fluorochromes; therefore, cluster plots can be obtained, which differentiate between the homozygous major allele (green dots in Fig. 1c), the homozygous minor allele (red triangles) and the heterozygous (blue squares) individuals that carry allele A on one and allele B on the other chromosome (Fig. 1c). The automatic clustering of the fluorescence intensities to separate between the three possible genotypes may lead to misclassification of patients in cluster plots—a problem that is frequently underestimated when performing genome-wide association studies (GWAS) with SNP chips. An example is provided in Fig. 1c, where the heterozygous patients have been misclassified, due to the use of a sub-optimal classification algorithm for a specific data set. Although misclassifications can easily be identified by manual inspection, this is not feasible for 900,000 SNPs and large numbers of patients. However, to avoid possible validation of false positive SNPs caused by misclassification in cluster plots, the candidate genes identified from a discovery group (see below) should be manually controlled.

Fig. 1
figure 1

a Exchange of a single base pair present in at least 1% of the population, T:A to C:G in the shown example, is defined as a single nucleotide polymorphism (SNP). The human genome contains approximately 3 million SNPs (picture from: “Dna-SNP.svg”, Wikipedia, author: David Hall, Gringer, licence: CC-by 2.5, http://creativecommons.org/licenses/by/2.5/deed.de). b Principle of a DNA microarray chip. Single-stranded DNA oligonucleotides function as DNA probes by hybridizing DNA fragments from the analysed sample whose nucleotide sequences are homologous. The technique can be applied to differentiate whether an A or a G is present in a certain sequence (from: Carr et al. 2008). c Cluster plot obtained from a DNA microarray. The x-axis (contrast) and the y-axis (strength) indicate transformed values of the two allele intensities SA and SB for the [A] and the [B] allele, respectively. They are defined as follows: Contrast = (SA − SB)/(SA + SB) and Strength = log(SA + SB) (BRLMM whitepaper: BRLMM: an Improved Genotype Calling Method for the GeneChip® Human Mapping 500K Array Set, Revision Date: 2006-04-14; http://media.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf). Homozygous major alleles are plotted as green dots (right cluster), heterozygous genotypes as blue squares (middle cluster), homozygous minor alleles as red triangles (left cluster), and not determinable genotypes as grey crosses. The example illustrates the problem of misclassification of patients in cluster plots using automated systems. Some heterozygous patients have been misclassified as homozygous (red triangles instead of blue squares)

A second major problem is that false positive results are generally caused by multiple testing. When testing with P < 0.05 without adjustment for multiple testing, approximately five false positive SNPs will occur in 100 tested SNPs, with a potential outcome of approximately 45,000 false positives in 900,000 SNPs. One possibility to avoid false positives is performing an adjustment for multiple testing, for example by the Bonferroni technique. However, this technique is very conservative and rejects a large number of real positives. Therefore, a good strategy is to establish a “discovery group” where the most promising SNPs are identified. Only a small number of hypotheses will then be validated in an independent group, the so-called “follow-up group”. This study design has the additional advantage that analysis in the follow-up groups can be done using much cheaper conventional PCR, whereas the expensive SNP chips are limited to the discovery group.

Combining data from different GWAS to increase the power of the discovery group, it has to be noted that there is little overlap between arrays from different distributors (see Fig. 2a), and also handling data generated with larger arrays of the same distributor results in gaps of thousands of SNPs. To gain information about untyped SNPs, e.g. to investigate candidate regions in more detail or to combine data from different SNP assays, a number of imputation techniques are available (BEAGLE: Browning and Browning 2007, 2009; BIMBAM: Servin and Stephens 2007; fastPHASE: Scheet and Stephens 2006; GenABEL: Aulchenko et al. 2007; IMPUTE/SNPTEST: Marchini et al. 2007; Marchini and Howie 2008; MaCH: Li Y et al. 2010b; Plink: Purcell et al. 2007; TUNA: Nicolae 2006). These approaches are used to infer unknown genotypes using measured loci and linkage disequilibrium information, e.g. from reference panel data usually HapMap (Halperin and Stephan 2009a, b). Comparisons of imputation algorithms can be found in Pei et al. (2008, 2010), Browning (2008), Hao et al. (2009), and Yu and Schaid (2007). Potential difficulties due to ethnic differences between the study group and the reference panel were addressed by Huang et al. (2009).

Fig. 2
figure 2

a The Venn diagram illustrates relatively little overlap between two frequently applied SNP chips, the Affymetrix SNP Array 5.0 and the Illumina Human 610 Quad, and the SNP500Cancer database. b An overview of how many of the previously analysed SNPs (the so-called “old SNPs” in bladder cancer case–control series; Table 2, Supplemental Table 1) are present on currently used SNP chips. Considering that the recent large GWAS (Table 1) have been performed with Illumina chips only 60 of the 163 “old SNPs” could have been discovered by these SNP chip studies. Presence or absence of SNPs if identified by rs numbers was determined using the software R, version 2.12.1 and annotation from the meta-data packages pd.genomewidesnp.5 version 1.1.0 (Affymetrix SNP Chip 5.0) and human610quadv1bCrlmm version 1.0.2 (Illumina Human 610 Quad) and from the SNP500Cancer database (from: ftp://ftp-snp500cancer.nci.nih.gov/snp500Cancer/Genotypes/allgenes.tab, access date 05 Jan 2011)

Strategy for discovery and validation of SNPs

The study design with a discovery group and follow-up groups has recently been applied in order to identify new SNPs that are associated with urinary bladder cancer risk (Kiemeney et al. 2008, 2010; Wu et al. 2009; Rafnar et al. 2009; Rothman et al. 2010). For example, Kiemeney et al. (2010) studied 4,580 bladder cancer cases and 45,269 controls, where the discovery group consisted of 1,889 cases and 39,310 controls. The follow-up groups included 2,691 cases and 5,959 controls. The twenty most significant SNPs from the discovery group, all with P ≤ 2.5 × 10−5, were validated in the follow-up groups. This resulted in the identification of a SNP on chromosome 4p16.3 (rs798766) that was associated with bladder cancer risk in the discovery group (P = 2.4 × 10−5) and in the follow-up group (P = 8.5 × 10−8) (Kiemeney et al. 2010). Odds ratios (OR) for the T allele of rs7998766 were 1.22 (95% confidence interval: 1.11–1.34) in the discovery group and 1.26 (1.16–1.37) in the follow-up group. In the combined group, rs798766[T] was associated with an OR of 1.24 (P = 9.9 × 10−12). This association was significant, even after adjustment to cigarette smoking, age, and gender. No association between rs798766[T] and cigarette smoking or smoking quantity was obtained suggesting that rs798766 does not represent a genetic variation that confers susceptibility to addiction.

The rs798766 is located on intron 5 of TACC3 (transforming acidic coiled-coil containing protein 3), which is involved in the regulation of microtubule dynamics. TACC3’s relevance for bladder carcinogenesis is currently unknown; however, FGFR3, a neighbouring gene approximately 70 kb away from rs798766, contains activating mutations in about one-third of all bladder carcinomas. The critical role of FGFR3 in urinary bladder carcinogenesis led to the question whether the rs798766 polymorphism correlates with RNA levels of both FGFR3 and TACC3 (Kiemeney et al. 2010). Ideally, the optimal model for this study is cells originating from urinary bladder cancer, namely the epithelial cells of the bladder. Unfortunately, availability of bladder epithelial cells from tissue banks is limited; therefore, the analysis was performed in adipose tissue from 604 individuals. Interestingly, there was a significant correlation between rs798766 and RNA levels of both FGFR3 and TACC3 (Fig. 3). The highest RNA levels were obtained for the homozygous T allele, intermediate expression for the heterozygous C/T genotype, and lowest levels for the homozygous C allele. Currently, it is not known how the polymorphism of rs798766 influences expression of FGFR3 which is 70 kb away. A potential explanation is rs798766[T] leads to a conformational alteration of this region of the chromosome that improves accessibility of the transcriptional machinery. However, this remains speculative.

Fig. 3
figure 3

Correlation between rs798766 and RNA levels of FGFR3 and TACC3 in adipose tissue (from: Kiemeney et al. 2010)

A possible mechanism how rs798766[T] may contribute to bladder cancer risk is via the increased production of the FGFR3 protein as a consequence of enhanced gene expression, leading to an increased rate of proliferation and an increased probability for accumulation of mutations. The example of rs798766 illustrates that GWAS with discovery plus follow-up group design and sufficient case numbers can successfully be applied to identify novel disease-relevant SNPs.

State of the art: overview of recently discovered, validated SNPs

Using the above-described strategy for GWAS, nine SNPs have been identified since 2008 that are associated with bladder cancer risk (Table 1). Importantly, none of these SNPs have been previously described in relation to bladder cancer. In contrast to the newly discovered polymorphisms, the relevance of GSTM1 0/0, a deletion in the glutathione S-transferase M1 gene leading to loss of enzyme activity and the NAT2 polymorphism, was described by our group 15 years ago (Golka et al. 1996; Kempkes et al. 1996; Hengstler et al. 1998) and has been confirmed in the recent studies (Golka et al. 2009; Rothman et al. 2010).

Table 1 Overview of confirmed genetic variants in GWAS that are associated with urinary bladder cancer risk

An overview of the closest genes to the newly discovered and validated SNPs is given in Table 1. The well-known proto-oncogene c-Myc encodes a DNA-binding factor that activates or suppresses transcription, thus explaining its regulation of many target genes involved in the proliferation and cell cycle progression (Dominguez-Sola et al. 2007). TP63 shows strong homology to the tumour suppressor P53 and, similar to P53, is also involved in cell cycle and apoptosis control (Sayan et al. 2007; Lefkimmiatis et al. 2009). Prostate stem cell antigen (PSCA) is a cell surface protein associated with prostate and other types of cancer (Watabe et al. 2002). The TERT gene represents the reverse transcriptase component of telomerase that is essential for the maintenance of DNA length and cellular immortality (Cheung and Deng 2008; Florl and Schulz 2008). Little is known about the function of cleft lip and palate transmembrane 1-like gene (CLPTM1L). CLPTM1L is upregulated in cisplatin-resistant cell lines and might be involved in apoptosis control (Liu et al. 2010). Fibroblast growth factor receptor 3 (FGFR3) belongs to a family of polypeptide growth factors containing a cytoplasmic tyrosine kinase domain and extracellular immunoglobulin-like domains and is involved in mitogenesis, angiogenesis, and wound healing (Keegan et al. 1991). TACC3 plays a role in the maintenance of nuclear envelope structure and in cell division control (Gómez-Baldó et al. 2010). NAT2 is a phase II metabolizing enzyme involved in detoxification, but also bioactivation of xenobiotics (Hengstler et al. 1998). Low NAT2 activity is associated with increased bladder cancer risk in Caucasians. Chromobox homolog 7 (CBX7) positively regulates E-cadherin expression by interacting with histone deacetylase 2 (Federico et al. 2009), possibly explaining why loss of CBX7 expression is associated with a highly malignant phenotype of carcinomas. APOBEC3 deaminases cause G to A hypermutation in nascent DNA of hepatitis B viruses which seems to play a role in antiviral defence (Abe et al. 2009). Overexpression of APOBEC3 genes may lead to mutations in the genome and influence the development of tumours (Vartanian et al. 2008). Cyclin E (CCNE1) controls cell cycle progression at the G1/S transition (Koff et al. 1991). UDP-glucuronosyltransferase 1A (UGT1A) is a phase II metabolizing enzyme that catalyses glucuronidation and elimination of numerous lipophilic xenobiotics, thereby acting as a detoxifying enzyme (Hengstler et al. 2000; Strassburg et al. 2008). Glutathione S-transferase M1 is also a phase II metabolizing enzyme (Bolt and Thier 2006) that upon conjugation with glutathione detoxifies numerous xenobiotics including polycyclic aromatic hydrocarbons that are known bladder carcinogens (Golka et al. 2009). In conclusion, the functions of the discussed genes focus on carcinogen detoxification, control of the cell cycle as well as apoptosis, and maintenance of DNA integrity.

More than 75 further studies of SNPs and bladder cancer risk

A literature search on SNPs and bladder cancer identified more than 75 further studies reporting on genetic variants in bladder cancer (Tables 2, 3, Supplemental Table 1). Most of these SNPs (n = 34) affect genes encoding xenobiotic metabolizing enzymes (Table 2). Moreover, variants of genes have been reported that play a role in DNA repair and damage signalling (n = 58), cell cycle control, DNA replication, translesion synthesis and transcription (n = 14), inflammation (n = 13), apoptosis (n = 8), methylation (n = 4), growth factors (n = 5), matrix metalloproteinases (n = 5), mTOR and associated factors (n = 4), and others (n = 18) (Table 3, Supplemental Table 1). However, in contrast to the SNPs summarized in the previous paragraph and in Table 1, the latter variants have not been consistently validated in independent case–control series. Therefore, we analysed the fraction of the 163 SNPs from Table 3, Supplemental Table 1 (“old SNPs”) that are represented on the SNP chips from the previously published GWAS (Table 1), because in this case it is unlikely that the “old SNPs” could be verified in genome-wide studies. Considering that the recent GWAS (Table 1) were performed with Illumina chips, it can be concluded that 60 of the “old SNPs” were re-analysed in the Illumina SNP chip studies but not confirmed. However, 103 of the “old SNPs” are not currently present on the Illumina SNP chip and therefore represent candidates that could be validated (or disproven) in future. Of course, most of the mentioned SNPs are likely to be in linkage disequilibrium with markers on the SNP chips and some of their neighbouring SNPs might indeed show small P values. However, not reaching genome-wide significance, the association with “old SNPs” is unlikely to be recognized. Nevertheless, final answers will most probably be provided when the next generation sequencing studies (see paragraph below) are finished.

Table 2 Polymorphic xenobiotic metabolizing enzymes (“old SNPs”), reported to influence bladder cancer risk and the coverage by the five SNP chips Affymetrix 5.0, 6.0, Illumina Hap300Duo, Human610quad, Omni1Mduo, and the SNP500Cancer database
Table 3 Polymorphic genes (and loci) (“old SNPs”) not involved in xenobiotic metabolism but reported to influence bladder cancer risk and its rs numbers

Implications for prevention?

A common feature of the novel bladder cancer-associated SNPs is the relatively low odds ratio (<1.5), when compared to heavy cigarette smoking that is associated with an odds ratio of approximately four (e.g. OR = 3.69; 95% CI = 2.97–4.67; P = 8.1 × 10−33 in Lehmann et al. 2010). Generally, an odds ratio of 1.5 is of minor relevance for an individual and would not justify additional medical precautionary measures. However, a relevant question is whether the high-risk alleles of several of the SNPs in Table 1 interact, leading to odds ratios that are similar or even higher than that of cigarette smoking. If so, one should consider that individuals carrying combinations of several high-risk alleles are rare. For example, only approximately eight in 1,000 individuals may carry a combination of four high-risk alleles if the frequency of the individual high-risk alleles is 0.3. Nevertheless, identifying these individuals would be advantageous, especially if adequate measures of precaution such as cystoscopy or specific imaging techniques, in the case of bladder cancer, are too expensive or too invasive to be applied to the general population. The above example illustrates that studies are required to test whether influential SNPs interact and add together to create odds ratios that would justify preventive measures.

Strong impact of “wimp SNPs”

To date, GWAS have identified more than 300 validated associations between genetic variants and approximately 70 common diseases (Hindorff et al. 2009; http://www.genome.gov/gwastudies). Only a small number of rare variants with a frequency of usually much less than 1% is associated with a strongly enhanced risk, such as genetic variants of TP53, RB1, BRCA1, and BRCA2. A very small number of SNPs have effects of a factor of two or higher, for example APOE4 in Alzheimer’s disease, LOXL1 in exfoliative glaucoma, and CFH in age-related macular degeneration (Altshuler et al. 2008). Nevertheless, the majority of all identified SNPs have odds ratios between 1.1 and 1.5. In the case of urinary bladder cancer, all known SNPs increase risk by a factor smaller than 1.5. It is likely that these “wimp SNPs” interact and that complex combinations of several SNPs have a strong influence on whether an individual will develop cancer or not.

However, no comprehensive studies on “wimp SNPs” and “wimp-SNP”-environment interactions are currently available. It should also be noted that variants identified so far, including the novel SNPs identified by GWAS, explain only approximately 5–10% of the overall inherited risk (Altshuler et al. 2008). The remaining variance may be due to an even higher number of SNPs with odds ratios smaller than 1.1; however, the case numbers of most completed GWAS were too small to identify variants associated with such low risk. Nevertheless, it is likely that the “wimpiest-wimp SNPs” with odds ratios smaller than 1.1 are collectively even stronger (Varghese and Easton 2010). Additionally, the locus-attributable risk may have been underestimated, because the marker SNPs identified in GWAS were suboptimal proxies for the causal mutations (Altshuler et al. 2008). The genetic variance explained by the variants identified so far may also have been underestimated, because gene–gene and gene–environment interactions have not yet been adequately considered. Finally, many rare variants may remain undiscovered, because they cannot be identified by SNP chip analysis but require systematic sequencing.

SNPs of distant-acting enhancers

Many of the recently discovered SNPs associated with bladder cancer risk are located in non-coding regions. Examples are the sequence variant rs9642880 on chromosome 8q24, which is 30 kb upstream of Myc (Kiemeney et al. 2008) and rs798766 on 4q16.3, 70 kb from FGFR3 (Kiemeney et al. 2010). Both SNPs are located so far from the exons whose expression they influence that the effect cannot be explained by linkage disequilibrium. In principle, it is not surprising that non-coding sequences can play an important role. It is well known that approximately 5% of the human genome is evolutionary conserved, and only less than one-third of this 5% consists of coding genes (Mouse Genome Sequencing Consortium et al. 2002). One possibility that might therefore explain these hits from GWAS is that the respective non-coding regions contain distant-acting enhancers (Visel et al. 2009). Distant-acting transcriptional enhancers represent sequences that can be located either downstream or upstream of the target gene or even within other genes. They consist of aggregations of transcription factor binding sites. Occupancy of these transcription factor binding sites leads to recruitment of transcriptional co-activators and chromatin remodelling. The protein aggregates at the distant-acting enhancer facilitate DNA looping, whereby the enhancer relocates to physical proximity of the target gene promoter and finally activates transcription by RNA polymerase II (Visel et al. 2009). This mechanism would also explain why some of the novel SNPs act in a tissue specific manner enhancing, for example, the risk of urinary bladder cancer but not that of breast cancer (Kiemeney et al. 2008). In any tissue, only a subset of enhancers is active, because only a specific set of transcription factors is formed. Therefore, it is plausible that rs9642880 and rs798766 are located within urothelium-specific enhancers. In future, it will be interesting to study whether the relatively high number of cancer-associated SNPs in non-coding regions identify distant-acting transcriptional enhancers.

Do SNPs differentiate between bladder cancer with and without exposure to bladder carcinogens?

In most countries, the eligibility criteria for occupational disability compensation are restrictive. Additional criteria that help identify cases where past occupational exposure has contributed to carcinogenesis are welcome. Therefore, it would be relevant to analyse whether specific SNP patterns can differentiate between urinary bladder carcinomas with and without occupational exposure to carcinogens. Recent evidence suggests that such differentiation is possible (Golka et al. 2009). Occupational exposure to aromatic amines and polycyclic aromatic hydrocarbons (PAHs) has been documented in several case–control series (Table 4A). The Wittenberg case–control series is a hospital-based study comprising only a relatively small fraction of individuals with occupational exposure to aromatic amines or PAHs (Table 4A). In contrast, the “Occupational case–control series” comprises individuals that have been evaluated for bladder cancer as an occupational disease, showing a high fraction of individuals exposed to aromatic amines (61%) and PAHs (27%). The Dortmund case–control series comprises former workers from the coal, iron, and steel industries and contains the highest fraction of individuals exposed to PAHs (52%) but not to aromatic amines (0%). Interestingly, GSTM1 0/0 was significantly associated with bladder cancer in the two case–control series with relatively high exposure to PAHs (Table 4B). In contrast, no significant association of GSTM1 0/0 was obtained in the Wittenberg case–control series with a relatively low number of individuals exposed to PAHs. On the other hand, rs9642880[T] was significantly associated with bladder cancer risk in the Wittenberg and not in the occupational as well as the Dortmund case–control series (Table 4B). This result suggests that the quality and quantity of exposure to bladder carcinogens determines which SNPs are relevant. In the case of exposure to certain carcinogens, the influence of SNPs on relevant detoxifying enzymes may increase. However, this concept must be discussed with caution for different reasons: firstly, because of the relatively small case numbers in the current study and secondly because of the lack of a consistent interaction of GSTM1 0/0 and cigarette smoking. In a recently published meta-analysis, GSTM1 0/0 was reported to be associated with a similarly increased risk in smokers and non-smokers (García-Closas et al. 2005), which speaks against an enhanced role of GSTM1 0/0 in the presence of cigarette smoke-associated carcinogens. On the other hand, NAT2 slow acetylators have been reported to be especially susceptible to the adverse effects of cigarette smoking on bladder cancer risk (García-Closas et al. 2005). Further studies are needed to analyse whether bladder carcinomas with and without occupational exposure to carcinogens can be differentiated by SNP patterns. Such gene–environment interactions may be particularly interesting for the recently discovered SNP of the detoxifying enzyme UGT1A (Table 1).

Table 4 Association of bladder cancer risk with rs9642880 and GSTM1 in three case–control series with different exposure to bladder carcinogens

Future perspective: next generation sequencing

Sequencing of the first human genome took approximately 50 years and an investment of more than two billion Euros. With the advent of deep sequencing, the required time has been reduced to weeks. Within the next 10 years, the time required for sequencing of a human genome may be reduced to less than 1 day. Therefore, it can be expected that GWAS with SNP chips will soon be replaced by whole genome sequencing, thus allowing access to critical further information, particularly rare mutations. As a consequence, this will further increase the problem of multiple testing and even larger case–control series will be needed. Nevertheless, deep sequencing will, for the first time, give the opportunity to analyse comprehensively and quantitatively the degree to which interindividual differences within our genome contribute to our overall cancer risk.