Introduction

Evidence from epidemiological and genetic studies provides increasing support for the existence of inherited susceptibility to prostate cancer (PC) (Schaid 2004). Despite large efforts made during the last two decades, however, no major PC genes have been identified. Instead, genetic epidemiological studies suggest several low-penetrance genes, each contributing to a small risk.

The E-cadherin gene (CDH1, OMIM 192090) located on chromosome 16q22.1 encodes a transmembrane glycoprotein that mediates cell–cell adhesion and—in conjunction with cytoplasmic catenin proteins—also cell signaling. The cadherin/catenin complex plays a crucial part in cell polarity, preservation of normal tissue morphology and cellular differentiation (Grunwald 1993). Since CDH1 also acts as a suppressor of invasion, aberrations in CDH1 expression is related to PC progression, metastasis and poor prognosis among patients with PC (Umbas et al. 1994; Dunsmuir et al. 2000; Chunthapong et al. 2004). Frequent loss of heterozygosity (Suzuki et al. 1996; Latil et al. 1997) and suggestive linkage (Suarez et al. 2000; Goddard et al. 2001) in close proximity to CDH1 on the 16q arm provides further support for CDH1 as a PC susceptibility gene. A promoter SNP (rs16260) of CDH1 has been shown to decrease gene transcription (Li et al. 2000; Nakamura et al. 2002; Lei et al. 2002) and the association with PC susceptibility has been reported (Verhage et al. 2002). Recently, we reported a significant association between hereditary PC risk and the promoter SNP in a large Swedish population-based study (Jonsson et al. 2004).

The aim of the present study was to confirm the association between the CDH1 promoter SNP and PC risk in an independent population study. In addition, we performed a systematic evaluation of common CDH1 sequence variants in relation to PC risk using haplotype tagged SNPs (htSNPs). Selected SNPs were genotyped both in a large Swedish population-based case-control study and in Swedish PC families.

Subjects and methods

Cases and controls

CAncer Prostate in Sweden (CAPS) is a population-based case-control study described in detail elsewhere (Lindmark et al. 2004). All of the men living in the northern and central parts of Sweden, under the age of 80 years, and all of the men living in the Stockholm region and south-eastern part of Sweden, under the age of 65 years, at any time from March 2001 through September 2002, formed the study base. All men with a newly diagnosed adenocarcinoma of the prostate, cytologically or pathologically verified, in this population were eligible to participate in the study. Utilization of regional cancer registries, to which reporting of newly diagnosed cancer cases is compulsory, allowed complete case ascertainment. From the Swedish National Prostate Cancer Registry (http://www.roc.se) detailed clinical information such as Gleason score, PSA level at the time of diagnosis and tumor -node-metastasis stage was obtained.

In total, 1,961 PC patients were identified and invited to the study. Of them, 1,444 (74%) agreed to participate and from those a blood sample and a questionnaire concerning risk factors and family history were collected. For cases that reported at least one close relative being diagnosed with PC, a detailed family history was obtained by a second questionnaire through a telephone interview. Whenever possible, blood samples were collected from the affected relatives in these families.

Control subjects were randomly selected from the Swedish Population Registry and frequency matched according to gender, the expected age distribution of cases (groups of 5-year interval) and geographical region (two regions, representing north and south of Sweden including Stockholm). Control subjects were recruited concurrent with case subjects. A total of 1,697 controls were invited to the study and 866 (51%) agreed to participate. At the time of this study, DNA samples were available for 1,247 sporadic cases (SPC), 232 family positive cases [at least two relatives with a PC diagnosis within a nuclear family (FH+)] and 801 controls.

PC families

Recruitment of families with multiple PC cases has been conducted at the Department of Oncology, Umeå University, since 1995. Ascertainment of families has been mainly based on referrals by urologists and oncologists throughout Sweden, described in detail elsewhere (Grönberg et al. 1999). Families were initially taken for linkage analysis and almost all the families fulfilled the Carter criteria (Carter et al. 1993) for hereditary PC. The Carter criteria includes a cluster of three or more relatives affected with PC in any nuclear family, PC in three successive generations in either of the probands’ paternal or maternal lineages or two close relatives affected with PC at 55 years of age or younger. Blood samples have been collected from as many family members as possible. At the time of this study, a total of 81 families, including 157 FH+ cases sampled for DNA, were available for analysis.

Total study population

The total study population comprises 1,636 PC patients and 801 unaffected controls. Of the 1,636 case patients, 1,247 (76%) are classified as sporadic (SPC) and 389 (24%) as FH+. The mean age at diagnosis of PC was 66.7 (47.3–80.4) for SPC cases and 65.5 (43.0–81.0) for FH+ cases. The mean age of inclusion for the controls was 67.8 years with a range of (45.5–80.0). Written informed consent was obtained from each subject. The ethics committee at the Karolinska Institutet and Umeå University approved the study.

Selection of SNPs

The CDH1 gene covers 98 kb and has 16 exons. The target region for the selection of SNPs included 10 kb of the promoter region, all exons, introns and 5 kb of the 3′UTR. Using the public database SNPper (http://SNPper.chip.org/) we identified altogether 82 SNPs. A subset of 23 SNPs were selected, with the criteria of one SNP per 3 kb, reported minor allele frequency (MAF) of at least 5% and as low as possible proportion of repetitive sequences. We prioritized notably validated SNPs and additional attention was paid to SNPs in the promoter region. The selected SNPs were distributed as two SNPs in the promoter region, 14 in intron two, one in intron four, one in intron 10, one in intron 12, one in exon 13, one in intron 14 and two in exon 16 (Fig. 1a).

Fig. 1
figure 1

a The original selection of SNPs. b SNPs succeeded in genotyping. c Selected htSNPs genotyped in the whole material

We genotyped these 23 SNPs in a randomly selected subset of 94 controls from the CAPS study. As a result, we eliminated four SNPs because of repetitive sequences, three due to failed assay, three due to unreliable results and two due to monomorphic results. From the remaining 11 SNPs (Fig. 1b), haplotypes were inferred using the software PHASE (http://www.stat.washington.edu/stephens/software.html). By haplotype deviation analysis using the htSNP2 software (http://www-gene.cimr.cam.ac.uk/clayton/software/stata/htSNP2/), six SNPs that explained over 96% of the haplotype diversity were selected as htSNPs (Fig. 1c). Together with the promoter SNP-160C/A (rs16260), a total of seven SNPs were genotyped in the whole study population.

Genotyping

DNA samples were extracted from leukocytes using standard methods. Each DNA plate contained two CEPH controls, a water bank and blinded internal replicates.

We used dynamic allele-specific hybridization (DASH) as previously described (Howell et al. 1999; Prince et al. 2001; Jobs et al. 2003). For this, two PCR primers and one DASH probe per target mutation/SNP were designed by means of custom software (Fredman et al. 2004) provided by DynaMetrix Ltd (UK). Oligonucleotides were provided and HPLC purified, by Biomers GmbH (Germany). The DASH PCRs entailed amplifying short genomic fragments spanning the variant of interest, with one of the primers carrying a 5′-biotin label. Amplifications were performed in 5 μl volume, containing 1–2 ng genomic DNA, 0.38 μM biotinylated primer, 0.75 μM non-biotinylated primer, 0.03 U AmpliTaq Gold (PE Biosystems, CA, USA), 10% dimethylsulphoxide, 1× AmpliTaq Gold Buffer including 1.5 mM of MgCl2 (PE Biosystems) and 0.2 mM each dNTP. Thermal cycling was conducted on an MBS 384 device (Thermo-Electron, USA) as follows; 1× (10 min at 94°C), 35× (15 s at 94°C, 30 s at annealing temperature).

To verify a successful amplification, 0.5 μl of several randomly chosen samples were examined on a 3.0% low-melt agarose gel. DASH analysis of the PCR product was conducted on membrane macro-arrays, using the DASH-2 protocol (Jobs et al. 2003). Briefly, this entailed transferring samples to the membrane array by centrifugation (Jobs et al. 2002) or robotic gridding. The resulting individual arrays with up to 9,600 distinct samples/features were rinsed in 0.1 M NaOH to denature the PCR products, and then exposed to 2 ml HE buffer (0.1 M HEPES, 10 mM EDTA, pH 7.9) containing 4 nmol of suitable probe, itself end-labelled with ROX. After heating to 85°C and air-cooling to room temperature, the membrane was briefly rinsed in HE buffer. The array was then soaked in 40 ml HE-buffer containing SYBR GreenI dye at 1:20,000 dilution for up to 3 h. Using a DASH-2 device (DynaMetrix Ltd, UK), the membrane was taken through a DASH heating ramp (heating at 3°C/min from room temperature to 75°C) whilst fluorescence from the ROX acceptor dye on the probe was monitored. Data was collected at an interval of 0.5°C. Fluorescence changes with temperature (DNA-melting profiles) were used to distinguish different alleles, and this was done by means of generic melt-curve analysis software (DynaScore; DynaMetrix Ltd, UK). This software uses negative derivatives of fluorescence against temperature to reveal peak of denaturation rate (target-probe melting temperatures; T m) and thereby automatically assigns DNA samples into genotype groups.

Statistical methods

Hardy–Weinberg equilibrium test for each SNP as well as pair-wise linkage disequilibrium estimations were performed using a replication method as implemented in the GENETICS package (http://lib.stat.cmu.edu/R/CRAN/index.html) for the R programming language. For each test, 10,000 permutations were run.

The association between each SNP and PC risk was evaluated by a likelihood ratio test based on an unconditional logistic regression model, adjusted for age (in 5-year intervals) and two geographical regions representing the north and south of Sweden including Stockholm as implemented in the STATA software. Dependence among genotypes within the families was taken into account by adjusting the confidence intervals for the odds ratios (ORs) using the Huber/White/sandwich robust technique (Wooldridge 2002).

The HAPLO.STATS package (Schaid et al. 2002) in the software language R was used for haplotype analysis. This method provides both global and haplotype-specific tests. Only unrelated individuals were included in the haplotype analyses and all computations were adjusted for age and geographical region as described above. Haplotypes with a frequency of <0.005 were pooled together. Empirical P values were computed by randomly permuting the trait and covariates (Besag and Clifford 1991). Precision criteria for the P values were set to a sample standard error of one fourth of the estimated P value but at least 1,000 permutations were run for each simulation.

The association between the CDH1 markers and PC risk was also evaluated by a family-based transmission test as described by Clayton (1999) using the software TRANSMIT (http://www-gene.cimr.cam.ac.uk/clayton/software/). With this method, transmission of multi-locus haplotypes is possible to evaluate, even if the phase is unknown, and parental genotypes may be unknown. All reported P values are based on two-sided tests.

Results

Based on 115 duplicated samples, an estimated genotyping error rate of 0.3% was observed. SNPs were in pair-wise LD with each other (Range: 0.15–0.99) and they fulfilled HWE (P=0.12–0.90).

In order to perform an independent test of association between the CDH1 promoter SNP (rs16260) and PC risk, all individuals included in our initial analysis (Jonsson et al. 2004) were excluded. This revealed an independent replication study population comprising 540 control subjects, 612 SPC cases, and 211 FH+ cases. Comparing controls and family positive cases within this replication population we found strong evidence of association between the promoter SNP and risk of PC (P=0.003, assuming an additive genetic model; Table 1). Genotype-specific risk estimates were essentially the same as observed among individuals included in the initial analysis. In the total study population, the OR for CA-heterozygous carriers was 1.5 (1.1–2.0) and for AA-homozygous carriers 2.6 (1.6–4.3) compared to CC-homozygous carriers. No significant difference in genotype frequency of the promoter SNP was observed between the SPC cases and controls (P=0.37; Table 2).

Table 1 Genotype frequencies and ORs for the promoter SNP rs16260 in unaffected controls and FH+ cases
Table 2 Allele frequencies for the E-cadherin gene (CDH1) SNPs and corresponding P values among control subjects and PC patients with sporadic (SPC) and cases with a positive family history of PC (FH+)

To comprehensively evaluate common sequence variants in CDH1 with respect to PC risk we genotyped an additional six SNPs in the gene, thereby capturing the majority of the haplotype diversity. Among cases with a positive family history the rs4783681 SNP was significantly associated with PC risk (P=0.005; Table 2). Among SPC the prevalence of two SNPs was significantly different compared to controls; rs2010724 (P=0.02) and rs1801026 (P=0.04).

We inferred nine CDH1 haplotypes with a frequency over 1% among the controls (Table 3). A significant difference in haplotype distribution was observed between cases with a positive family history and controls (P=0.05 global test). The second most common haplotype (HapB, control frequency = 0.25) was associated with positive family history of PC (P=0.004; Table 3). Interestingly, HapB was the only haplotype with a frequency over 1% carrying the variant A-allele for promoter SNP rs16260. In addition, haplotype frequency for HapD was significantly higher among SPC cases compared to controls (P=0.02; Table 3).

Table 3 Haplotype frequencies in CDH1 among controls, SPC and FH+ with corresponding P values

For family-based transmission tests 123 informative PC families, originated from both CAPS and collected PC families, were available for analysis. In total, these families included 340 PC cases and 464 unaffected relatives. DNA was available for an average of 2.1 affected cases within each family. The promoter SNP rs16260 were transmitted in a greater extent than expected to affected offspring compared to unaffected (P=0.02; Table 4). In addition, three more SNPs (rs4783681, rs1125557 and rs2276329) were significantly over-transmitted to an affected offspring (Table 4). A global test revealed a significant difference in haplotype transmission between affected and unaffected offsprings (P=0.01). Haplotype-specific tests revealed that HapB was transmitted in a greater extent than expected (P=0.02). No other haplotypes were significantly over-transmitted to affected offspring.

Table 4 Family based transmission results for SNPs in Swedish PC families

Discussion

We report a strong confirmation of the association between PC risk in FH+ cases and the functional CDH1 promoter SNP rs16260. Previously we identified an elevated risk for carriers of the A-allele in cases with a positive family history of PC and suggested an additive inheritance model. Our findings in an independent study population strongly (P=0.003) support this hypothesis.

This replication is unique in the context of the lack of positive replication studies in genetic epidemiology. Considering the high false-positive rate of reported associations between complex diseases and genetic variants, estimated to be as high as 95% (Colhoun et al. 2003), emphasizes the importance of replication studies in independent populations.

One source of confounding contribution to false-positive association results is the effect of population stratification. However, since a family-based association test revealed a significant (P=0.02) over-transmission of rs16260 to affected offspring, we are confident that this is not an issue in our study. Besides the significant replication result, the impact of rs16260 in PC etiology is further supported by the haplotype analyses. The only haplotype carrying the variant allele of rs16260 was also the only haplotype that was significantly associated with PC risk among family-positive cases. This haplotype has a high prevalence (25%) in the Swedish population and includes four SNPs with variant alleles (rs16260, rs4783681, rs1125557, rs2010724) that were all positively correlated with PC (data not shown).

In addition, to replicate our previous finding for the promoter SNP, we also performed a comprehensive evaluation of the CDH1 gene. By means of htSNPs, we captured over 96% of the haplotype variation and found association between several CDH1 sequence variants and PC risk. Apart from rs16260, rs4783681 was strongly associated with family-positive PC. However, these two SNPs are in strong LD with each other (D′=0.95). We observed no association between the functional promoter SNP and SPC risk. This suggests a joint effect from several genetic variants necessary for disease development, with rs16260 as one polymorphism. The absence of LD between rs16260 and the other variants will result in independent inheritance, implying a low proportion of the general population carrying all risk alleles. However, within PC families a higher proportion of risk allele carriers are expected.

We also identified a marginally significant association between two SNPs (rs2010724, rs1801026) and sporadic PC risk. To investigate the possibility of false-positive findings due to multiple testing, we performed a data simulation by randomly permuting case-control status and then re-evaluated association for each SNP. P values adjusted for multiple testing were then computed based on the empirical distribution of the maximum of the seven test statistics. In addition, we estimated the probability of observing at least two significant associations (at the 5% level) out of seven tests under the null hypothesis of no associations. Based on 10,000 permutations, all adjusted P values were non-significant; however, the probability for observing at least two significant associations was estimated to only 7%.

The E-cadherin gene is a major cell adhesion molecule and determinant of cell polarity. It is a component of the cadherin/catenin complex which is important for cellular polarity, normal tissue morphology and cellular differentiation (Grunwald 1993). The gene is a part of the adherens junction at which it provides cell–cell adhesion through Ca2+-dependent homophilic binding between molecules on adjacent epithelial cells. In most malignancies originated from epithelial tissue, CDH1-mediated cell–cell adhesion is lost when the tumor becomes more malignant. Somatic loss of E-Cadherin is considered to be a defining feature in invasive lobular breast cancers and diffuse gastric cancers (Berx et al. 1998), and germline mutations, since 1998 have been reported to predispose to hereditary diffuse gastric cancer (Oliveira et al. 2003). Aberrations in CDH1 expression have been shown to associate with PC progression, metastasis and poor prognosis (Umbas et al. 1994; Dunsmuir et al. 2000; Chunthapong et al. 2004). The variant A-allele of promoter SNP rs16260 has been associated with a reduction of 68% in gene transcriptional activity (Li et al. 2000). Genetic epidemiological studies on PC have found suggestive linkage as well as LOH on chromosome 16q in close proximity to the gene (Suzuki et al. 1996; Latil et al. 1997; Suarez et al. 2000; Goddard et al. 2001).

Three other studies concerning rs16260 and PC risk have been published; Verhage et al. (2002) analyzed 82 cases and 188 controls and found a significant increased risk for carriers of the A-allele. The study population utilized by Verhage included both sporadic and hereditary PC cases and all subjects were Caucasians. Hajdinjak and Toplak (2004) identified a non-significant excess risk between 183 sporadic PC patients and 198 control subjects. All participants were Caucasians. In contrast to these studies, Tsukino et al. (2004) found no association among 219 cases and 219 controls in a Japanese population.

There are some strong points worth mentioning in this study. CAPS is a large population-based study with a well-characterized phenotype with all PC diagnosis histologically or cytologically confirmed. Using the method developed by Wacholder et al.(2004), the false-positive report probability (FPRP) of the positive findings for rs16260 is in the range of 0.25–0.06 assuming prior probabilities of 0.01–0.05 and a true OR of 1.5. The Swedish population is genetically homogenous which minimizes possible confounding by population stratification, notably because we matched on geographical region. Furthermore, several of the observed associations in the case-control population were confirmed in family-based association tests.

It should be stressed that even though we captured the majority of genetic variation in the gene, there may still exist causal rare alterations that were not identified in this study.

To date, no major PC gene has been identified, indicating the genetic susceptibility of PC to be complex, depending on both individual genes and their interplay. The cadherin/catenin complex—including CDH1, α-catenin, β-catenin and p120—are strong candidate genes interacting with each other. In the human genome, the close neighborhood of CDH1 reveals a strong LD pattern with its neighbor gene P-cadherin (CDH3), (Fig. 2a). Utilizing the HapMap project (http://www.hapmap.org) and the Haploview software (http://www.broad.mit.edu/personal/jcbarret/haplo), we inferred haplotype block structures using the method of LD confidence interval (Gabriel et al. 2002). This revealed an LD-block including the promoter region of CDH1 extending up-stream into the CDH3 gene (Fig. 2b). As CDH1, CDH3 is a cell adhesion molecule and this block extension makes CDH3 another interesting candidate gene for PC susceptibility.

Fig. 2
figure 2

a Pair wise LD-plot for the region of chromosome 16q22.1 spanning over 197 kb. Red color indicates strong pair wise LD. b Haplotype blocks defined by the method of LD confidence interval (Gabriel et al. 2002). The picture is constructed in Haploview and data is taken from the HapMap project. For each SNP blue color represents the normal allele and red represents the variant allele. For a detailed description of the picture the reader is referred to (http://www.broad.mit.edu/personal/jcbarret/haplo/documentation.php)

In conclusion, we report a strong confirmation of the association between family-positive PC risk and a functional CDH1 promoter SNP in an independent population. This association was confirmed by both haplotype- and family-based association analyses. Due to the intrinsic difficulties in replication of genetic association studies, particularly in PC epidemiology, our findings are important and novel. These results together with the biological importance of CDH1 encourage further evaluation of genetic variation in cell adhesion molecules and their relation to PC risk.