Introduction

Genetic predisposition to prostate cancer has been well established through genetic epidemiological studies (Isaacs et al. 2001). Evidence for major prostate cancer susceptibility genes segregating in families has been obtained from several complex segregation analyses, with the majority supporting a dominant mode of inheritance (Carter et al. 1992; Schaid et al. 1998; Gronberg et al. 2000; Verhage et al. 2001), while others support a recessive or X-linked mode of inheritance (Cui et al. 2001; Gong et al. 2002). The hypothesis that multiple major genes interact to increase prostate cancer susceptibility is supported by our understanding of tumorigenesis of cells, by the parallel evidence that multiple major genes contribute to cancer in other organ sites such as the breast (Miki et al. 1994; Wooster et al. 1995) and colon (Groden et al. 1991; Kinzler et al. 1991; Fishel et al. 1993; Leach et al. 1993; Bronner et al. 1994; Nicolaides et al. 1994; Miyaki et al. 1997), and by observational data from molecular and genetic epidemiological studies. A segregation study in 263 prostate cancer families found that the disease is more likely due to the contributions of two to four prostate cancer susceptibility genes than one gene (Conlon et al. 2003). The interaction of genes that leads to a disease may be described as consistent with either a heterogeneity model in which alterations in any of several genes is sufficient, or an epistatic model in which several simultaneous genetic alterations are required.

Linkage analysis methods that model interactions may increase the statistical power to detect linkage when interactions among genes exist. For example, under a two-locus epistatic model, the power to detect linkage at one susceptibility locus would increase if we assess the linkage among families that are linked at another susceptibility locus and vice versa. Several published studies have empirically demonstrated the increased power to detect linkage by modeling gene–gene interactions. For example, although there is a lack of linkage evidence at either the PTEN or CDKN1B chromosomal regions when each region was studied individually among 188 prostate cancer families ascertained from Johns Hopkins Hospital, a modeled interaction of these two regions in a linkage analysis provided significant evidence for linkage (Xu et al. 2004).

In spite of our understanding that multiple major genes likely contribute to prostate cancer susceptibility and that there are advantages in modeling interactions in linkage analyses, current studies to identify these major genes continue to primarily rely on single gene approaches. This is particularly true when exploring for novel regions of linkage using genome-wide scans. Among a dozen genome-wide scans for prostate cancer susceptibility genes published to date, none of these have modeled gene–gene interactions (Smith et al. 1996; Witte et al. 2000; Hsieh et al. 2001; Cunningham et al. 2003; Edwards et al. 2003; Janer et al. 2003; Lange et al. 2003; Schleutker et al. 2003; Wiklund et al. 2003; Xu et al. 2003; Maier et al. 2005). This gap may be primarily due to a combination of factors, such as a lack of standard analytical methods to model interactions and sample sizes that are not large enough to investigate interactions.

As an attempt to close this gap, we have designed and implemented a study to identify prostate cancer loci by modeling interactions in a linkage analysis of 426 prostate cancer families. We limited the analyses of interactions to two loci at a time and systematically evaluated evidence for prostate cancer linkage among all possible combinations of two loci across the entire genome using an ordered subset analysis (OSA) to model epistatic interactions (Hauser et al. 2004).

Methods

Study populations

The 426 prostate cancer families were ascertained from four independent studies, including 188 families from Johns Hopkins Hospital (Xu et al. 2003), 175 families from University of Michigan (Lange et al. 2003), 50 families from University of Umeå, Sweden (Wiklund et al. 2003), and 13 families from University of Tampere, Finland (Schleutker et al. 2003). A previous genome-wide linkage scan for a prostate cancer susceptibility locus combining these 426 families gave a nonparametric multipoint LOD score of 3.16 at chromosome 17q22 and LOD scores greater than 2.0 at chromosomes 2q32, 15q11, and Xq27 (Gillanders et al. 2004). A detailed description of the family ascertainment has been described elsewhere (Gillanders et al. 2004). The clinical characteristics of these families are summarized in Table 1. Informed consent was obtained from all participants, and study protocols were reviewed and approved by the Institutional Review Boards at each institution.

Table 1 Characteristics of families in the combined analysis

Marker genotyping

A detailed description of our methods for marker genotyping has been presented elsewhere (Gillanders et al. 2004). Briefly, genomic DNA was prepared from blood samples using standard techniques. All DNA samples were genotyped in a single laboratory using 406 short tandem repeat markers with an average inter-marker spacing of ~10 cM and an average heterozygosity of 80%. PCR products were separated using the ABI 377 or 3100 DNA sequencers, allowing multiple fluorescently labeled markers to be run in a single lane. Allele sizing was calculated using a local southern algorithm available in the GENESCAN software program (Applied Biosystems, Foster City, CA). Allele calling and binning was done using GENOTYPER software (Applied Biosystems, Foster City, CA). All genotyping included a CEPH control individual (1347-02) for quality control purposes. Additionally, 1% of samples were included as blind duplicates in order to evaluate the genotyping error rate. Of the 12,992 duplicated genotypes, eleven genotyping errors were detected (0.085% error rate).

Statistical analyses

Nonparametric multipoint linkage analysis was first performed using the computer program Genehunter-Plus (Kong and Cox 1997). Maximum likelihood marker allele frequency estimates were calculated from pedigree founders. Nonparametric-based LOD scores were calculated using the ‘Z-all’ allele-sharing statistic (Whittemore and Halpern 1994), with an equal weight assigned to each family. LOD scores were calculated at each marker location and at four evenly spaced locations between each pair of consecutive markers.

OSA conditional linkage analyses were performed to model epistatic interactions (Hauser et al. 2003). Briefly, given a pair of unlinked loci, a series of linkage analyses at the “target” region (locus 1) were evaluated conditional on the linkage results at the “reference” region (locus 2). Family-specific LOD scores at locus 2 were ranked from largest to smallest. The family with the largest LOD score at locus 2 was entered into the analysis and the corresponding LOD score was computed and recorded at locus 1, for that family. Next, a second linkage analysis at locus 1 was computed and recorded by combining the two families with the two largest LOD scores at locus 2. The ith OSA analysis proceeded by computing linkage at locus 1 using the subset of families with the i largest LOD scores at locus 2. This process was repeated until all families were sequentially added to the linkage analysis at the target region. For each target/reference marker pairing, we defined LODdelta=LODconditional−LODunconditional, where LODconditional was determined by the OSA analysis that provided the maximum LOD at locus 1.

To minimize the impact of multiple correlated tests, the reference markers were restricted to the 406 markers and did not include locations between the markers. The target loci consisted of the 406 markers and four evenly spaced locations between each pair of adjacent markers. Reference and target loci were restricted from being on the same chromosome. The statistical significance of the change in the LOD score (LODdelta) was evaluated by a permutation test under the null hypothesis that linkage at the target region was independent of linkage at the reference region. To further minimize the impact of multiple tests, 22 P-values representing chromosome-wide significance were estimated for each reference marker. Specifically, for each reference marker and for each of the 22 chromosomes not containing the reference marker in question, the LODdelta statistic was calculated at each target locus across the chromosome. The largest value of the LODdelta statistic across all target loci on the chromosome was used as the test statistic in the permutation testing procedure. Family-specific LOD scores at all target loci across the chromosome were randomly and jointly permuted with respect to the ordering defined by the reference locus, thus preserving the correlation structure of LOD scores between linked markers. The maximum LODdelta over all target loci on the chromosome was determined for each randomly permuted data set and this value was compared to the observed maximum LODdelta for the chromosome. The empirical P-value for each target chromosome-wide/reference marker-specific pairing was set equal to the number of replicated chromosome-wide maximum LODdeltas that were greater than or equal to our observed chromosome-wide maximum LODdelta. Using this target-chromosome-by-reference-marker design resulted in 22×406=8,932 hypotheses being tested. To account for the multiple tests, we considered target chromosome-wide/reference marker-specific P-values <~5.6×10−6 (0.05/8,932) to be globally statistically significant at the 0.05 level. This threshold is conservative as it does not account for the correlation of test results between linked reference markers. The number of permutations performed for the OSA varied as a function of the magnitude of the target chromosome-wide/reference marker-specific P-value for the sake of computational efficiency, and was set at a minimum of 100 and a maximum of 1,000,000.

In addition to the genome-wide interaction analyses, we focused particular attention on interactions with the region of prior, “main effects”, linkage on chromosome 17q22. Given the evidence for linkage at 17q22 (Gillanders et al. 2004), we performed analyses that focused specifically on 17q22 to reduce the impact of multiple testing and to subsequently reduce the required threshold for statistical significance. Specifically, we tested the hypotheses that if we conditioned on the reference marker D17S787, the marker directly under the linkage peak at 17q22, we would find increased evidence for linkage due to epistasis at other chromosomal locations. Given this a priori hypothesis, we used a threshold of 0.05/22=0.0023 to assess whether interactions with the reference marker S17S787 were statistically significant at the 0.05 level. In addition, we used the 1-LOD support interval about our linkage peak at 17q22 (from Gillanders et al. 2004) as the target region and looked for interactions conditioning on all 391 reference markers not on chromosome 17. We used a threshold for statistical significance in these analyses of 0.05/391=0.00013.

Results

Selected demographic and clinical characteristics (mean age of diagnosis, number of affected individuals, and ethnicity) are presented in Table 1. Collectively, this set of families represents one of the largest collections of prostate cancer families ascertained and analyzed to date. In particular, this set of families includes a large number of pedigrees that most likely segregate highly penetrant prostate cancer genes based on family characteristics. For example, there are 285 families with at least four men affected with prostate cancer, and 201 families with a mean age of diagnosis under 65 years.

We performed two-locus conditional linkage analyses for all possible pairs of loci across the genome in the 426 prostate cancer families using the OSA method. Multiple loci had significantly increased evidence for linkage compared to the unconditional single locus linkage analyses (Fig. 1). None of the target chromosome/reference marker combinations reached global statistical significance assuming our strict genome-wide threshold for statistical significance of P=5.6×10−6. As detailed in Table 2, the differences in the LOD scores between conditional and unconditional analyses (LODdelta) in six target chromosome/reference marker combinations reached target chromosome-wide/reference marker-specific significance levels of P≤0.0001. For example, when linkage analysis at a target region of chromosome 12 was evaluated conditional on the linkage results at reference regions across the genome, the evidence for linkage was maximized at 12q24 (LOD=5.69) when the 78 families with the highest LOD scores at 16p13 were included. This LOD score was significantly higher than the LOD of 0.29 in the unconditional analysis, with a LODdelta of 5.41. Among 1,000,000 randomly permuted datasets with respect to the relationship between target and reference regions, only 22 LODdeltas reached 5.41 (P=0.000022). This result revealed that increased allele sharing at 12q24 among affected men was observed most strongly in families that also had increased allele sharing at 16p13, a phenomenon that is consistent with an epistatic interaction of two prostate cancer susceptibility genes in these two regions. Evidence for interaction effects using a target chromosome-wide/reference marker-specific significance level of P<0.0001 were also observed for five other pairs of loci at 11q13, 22q13, 8q24, 20p13, and 5p13, when conditioning on reference markers at 13q12, 21q22, 7q21, 16q21, and 16p12, respectively.

Fig. 1
figure 1

Results from two-locus conditional linkage analyses for all possible pairs of loci across the genome among 426 prostate cancer families using the OSA method are presented using a contour plot. The LODdelta (LODdelta=LODcondtional−LODuncondtional) is plotted using a color scheme as indicated by the legend on the left of the figure. The six sets of strongest interaction (P<0.0001) are labeled

Table 2 Summary results from two-locus epistatic interaction linkage analysis in the genome

The evidence for epistatic interactions for these six sets of loci were consistently supported by two prior defined subsets of families; 188 families from Johns Hopkins and 238 families from the three other groups (Michigan, Umeå, and Tampere), as shown in Table 3. For example, when linkage analysis at chromosome 5 was evaluated conditional on the linkage results at reference regions across the genome among the 188 families from Johns Hopkins, the evidence for linkage was maximized at 5p13 (LOD=3.15) among families with the highest LOD score at 16p12. This LODdelta between conditional and unconditional analyses was 2.88 with a chromosome-wide significance level of P=0.009. Similarly, when linkage analysis at chromosome 5 was evaluated conditional on the linkage results at reference regions across the genome among the 238 families from the three other groups, the evidence for linkage was also maximized at 5p13 (LOD=3.57) among families with the highest LOD score at 16p12. This LODdelta between conditional and unconditional analyses was also 2.88 with a target chromosome-wide/reference marker-specific significance level of P=0.02. Consistent support of epistatic interactions from these two sets of families was also observed at the other five sets of loci (Table 3).

Table 3 Comparisons of top six strongest two-locus epistatic interaction regions between two independent sets of families

Considering that the 17q22 region was implicated by a single gene approach where a LOD of 3.16 was observed using nonparametric linkage analysis (Gillanders et al. 2004), we examined the results of the various interaction models involving this region. A marginally globally statistically significant interaction was observed in the epistatic interaction models using our reduced thresholds for statistical significance for this a priori plausible region. Specifically, when conditioning on the reference marker D17S787, the target region on chromosome 4q35 (target chromosome-wide/reference marker-specific P=0.0018) was marginally statistically significant at the 0.0023 level (deemed equivalent to a global significance level of 0.05) and the target region on chromosome 11q14 (target chromosome-wide/reference marker-specific P=0.0045) was suggestive. When linkage at chromosome 17q22 [using a 1-LOD support interval of our linkage peak in Gillanders et al. (2004) to define the target region] was evaluated conditional on the linkage results at the 391 nonchromosome 17 reference markers across the genome, the conditional LOD scores were maximized when conditioning on markers located at 1q44, 2p21, 4p15, 8q21, and 11q14, respectively (Table 4). None of the target 1-LOD-support-interval/reference marker-specific P-values reached the threshold of 0.00013 for global statistical significance. Interestingly, 11q14 showed up as suggestive in both sets of analyses.

Table 4 Results of two-locus interaction involving the 17q22 region

Discussion

Because multiple genetic alterations are likely required for tumor development (Land et al. 1983), the inheritance of germline mutations in several genes is expected to confer stronger susceptibility to cancer. Therefore, modeling gene–gene interactions in linkage analysis may improve the power to detect chromosomal regions that harbor these disease susceptibility genes, as demonstrated in our previous linkage study of PTEN and CDKN1B (Xu et al. 2004). The ability to systematically model interactions in linkage analyses has largely been hindered by a lack of appropriate analytical methods, inadequate computing power, and small sample sizes. In this study, we applied the OSA method to identify prostate cancer susceptibility genes by systematically modeling gene–gene interactions in linkage analyses over all possible pairs of loci across the genome among 426 prostate cancer families. We found evidence (defined by a target chromosome-wide/reference marker-specific P≤0.0001) of an epistatic interaction for six sets of loci. These results did not meet our strict conservative criterion, P=5.6×10−6, to be deemed globally statistically significant. However, it would require very strong interaction effects to overcome the burden of reaching global-statistical significance after accounting for the large number of tests. Our results should be useful for future prostate cancer interaction studies because we identified specific a priori locus combinations that should be followed up. We did, however, find evidence for significant/suggestive interactions between chromosome 17q22 loci and loci on chromosomes 4q35/11q14, respectively, using the reduced threshold level of statistical significance assumed for this a priori plausible region. Clearly our results will require confirmation from independent studies and from the identification of specific genes underlying this linkage evidence. Our approach to systematically assess gene–gene interactions across the genome represents a potentially powerful alternative approach for gene identification for complex diseases using linkage studies.

Increased ability to identify disease susceptibility genes by modeling gene–gene interaction using the OSA method was demonstrated in our linkage analysis of the PTEN and CDKN1B regions in the subset of 188 families from Johns Hopkins Hospital (Xu et al. 2004). It is worth noting that this gene–gene interaction between the PTEN and CDKN1B regions was also observed in our search for novel interactions across the entire genome among the full set of 426 prostate cancer families. For example, when conditioning on the linkage result at CDKN1B, the LOD score at the PTEN region was 3.50, which is significantly increased compared to the unconditional LOD score of 0.15 (P=0.007). Although this result did not meet our stringent criteria for genome-wide significance, this specific result can be considered significant in the case of candidate genes for which there is strong biological evidence.

The increased evidence for linkage between conditional and unconditional analyses in the six sets of loci suggested herein is consistent with the hypothesis of an epistatic interaction of two prostate cancer susceptibility genes; i.e., mutations in two genes are needed to increase prostate cancer risk. The significant LODdelta, however, may also represent type I errors (false positives). The correlation between results at neighboring loci (due to linkage between adjacent reference markers) and the large number of tests makes the probability of a type I error in this type of two-locus genome-wide scan difficult to assess. It is important to note that the OSA P-values we report are adjusted for all possible subsets of families and multiple target points along each chromosome. The reported P-values do not account for the multiple tests corresponding to the numerous different target chromosome/reference marker combinations. Thus, our type I error rate for the hypothesis of an interaction is protected and does not suffer from multiplicity due to evaluating a variable number of subsets of families and a large number of target loci per chromosome, but does suffer from the multiple tests corresponding to the different target chromosome/reference marker combinations. Support for interaction effects for all six sets of suggestive loci was observed in two independent subsets of families; however, it is important to note that the consistent support from these two subsets of families was not independent from the overall results. Finally, this study was based on the largest number of prostate cancer families studied to date and therefore, the results are less likely to be influenced by variation due to small sample size. Furthermore, this set of families is enriched for characteristics consistent with inherited prostate cancer susceptibility; more than 86% of these families have either ≥4 affected members in a family or have a mean age at diagnosis of <65 years.

In each of the six sets of interactions identified in this study, evidence for interactions was supported from both directions of the two involved regions, although one direction was stronger than the reverse direction. For example, the LODdelta was 5.41 (P<0.000022) at 12q24 when conditioning on the reference marker at 16p13. Conversely, the LODdelta was 3.18 (P=0.002) at 16p13 when conditioning on the reference marker at 12q24. There are at least two possible interpretations for this observation. The observation remains consistent with a two-locus epistatic interaction model. Under this model, linkage at locus 1 would increase among families that are strongly linked to a locus 2, and vice versa. The different strengths in supporting the epistatic interaction between the two reversible directions may be due to fluctuations of LOD scores at these two regions. LOD scores that are calculated in regions of linkage for multifactorial-susceptibility genes are highly influenced by sampling and multiple confounding factors such as phenocopies and incomplete penetrance. Alternatively, the observation of stronger evidence for an interaction in one direction of two involved loci may suggest that one of the two loci is a major gene while the other is a modifying gene. Under this model, linkage at a modifying gene (locus 1) would increase among families that are strongly linked to a major gene (locus 2); however, evidence for an interaction in the reverse direction may not be as strong.

We did not report the findings of searching for gene–gene interactions under a heterogeneity model using the OSA analysis in this study. We reasoned that the OSA method has less power to detect heterogeneity interactions. When implementing the OSA method to model a heterogeneity interaction, families would be ranked based on linkage evidence at the reference region, from smallest (most negative) to highest. The notion is that if a family is not linked at the reference region then, the family would be more likely to be linked at the target region. This is likely not an optimal approach to assess heterogeneity interactions in complex diseases. In linkage analysis of complex diseases, while a positive LOD score of a region within a specific family suggests the disease is likely linked to the region; a negative LOD score in a family could be observed due to a variety of reasons, including incomplete penetrance, phenocopies, and other genes (besides those at the target region).

The purpose of this study is to mine valuable data that were generated as part of the largest genome-wide screen for prostate cancer susceptibility genes reported to date. Our rationale to explore for gene–gene interactions is justified because interactions of multiple genes are widely hypothesized to influence risk for prostate cancer. The development of new analytical methods makes it feasible to systematically explore genome-wide interactions. While it is difficult to determine the true statistical significance of these findings, results of our study have generated specific hypotheses that can be tested in follow-up studies. In particular, we have identified six sets of loci with interactions and two regions, 4q35 and 11q14, that appear to interact with 17q22. In the future, such hypotheses can be readily tested among a set of over 1,200 prostate cancer families currently being assembled by the International Consortium for Prostate Cancer Genetics.