Introduction

Relatively few breast cancers can be attributed to familial cancer predisposition syndromes that involve mutations in single, highly penetrant genes, such as BRCA1/2 or TP53 (Li-Fraumeni syndrome) (Evans et al. 2002; Miki et al. 1994; Sidransky et al. 1992; Wooster et al. 1995). These highly penetrant mutations are rare and together account only for about 5% of all breast cancers (Serova et al. 1997). They appear to have little role in sporadic breast cancers where moderate or no family history is present (Futreal et al. 1994). Consequently, risk assessment based on testing for these familial predisposition genes is uninformative for the majority of women (Malone et al. 1998; Newman et al. 1998). Although a number of twin studies clearly point to the significant role of genetics in the etiology of sporadic breast cancer (Lichtenstein et al. 2000; Peto and Mack 2000), the genetic contribution to the majority of breast cancers remains to be elucidated.

Many common polymorphisms have been examined for their association with breast cancer (reviewed in de Jong et al. 2002; Dunning et al. 1999). Generally, candidate polymorphisms are chosen for study based on probable roles in the biochemical or physiological pathways involved in breast carcinogenesis. Typically, the breast cancer risk conferred by these single polymorphisms is small to moderate, with odds ratios (ORs) that reach significance ranging from 1.3 to 2.5. Perhaps these results are not surprising because breast cancer is certainly a complex disease whose genetic risk is likely to be determined by multiple interactions among weakly penetrant polymorphisms (Antoniou et al. 2001; Pharoah et al. 2002). A few studies of limited numbers of candidate genes and often small sample sets have suggested that additive effects among common polymorphisms have a significant role in breast cancer risk (Comings et al. 2003; Feigelson and Henderson 2000; Feigelson et al. 2001; Fu et al. 2003; Huang et al. 1999). Before developing estimates of cancer risk based on multiple genes, it is important to evaluate empirically potential interactions between genes.

Here, we report the examination of ten common genetic polymorphisms in a single associative study of 631 breast cancer cases diagnosed before the age of 53 years and 1,504 cancer-free controls enrolled before age 53. The focus on women under 53 years of age arises from the generalization that genetic determinants are more obvious in young compared with older women; furthermore, earlier identification of women at high risk may change their age of first surveillance. We have studied candidate single nucleotide polymorphisms (SNPs) in ten genes likely to be involved in the physiological pathways that influence development of breast carcinoma. These include genes involved in cell cycle control (PHB, HER2, CCND1), carcinogen metabolism (SULT1A1, NQO1, GSTP1), DNA repair (ERCC2), and steroid hormone metabolism (VDR, CYP17, COMT). Whereas many more SNPs in the literature may meet these criteria, the ten SNPs chosen for this study were selected as being the most likely to influence breast cancer risk.

Most of the selected SNPs (HER2, SULT1A1, NQO1, GSTP1, ERCC2, COMT) lead to amino acid substitutions in proteins and either are known or are predicted to alter enzymatic or metabolic activity (see references in Table 1). The other SNPs are in promoter regions that regulate transcriptional activity (CYP17) or in transcribed non-coding regions that influence biological function, such as RNA splicing (CCND1), or regulatory RNA activity (PHB). For example, the promoter polymorphism in CYP17 has been associated with altered transcriptional efficiency leading to changes in circulating hormone levels (Feigelson et al. 1998; Haiman et al. 1999). The CCND1 variant A allele produces a truncated transcript, through alternative splicing, that has a longer than normal half life leading to the accumulation of cellular cyclin D1 (Betticher et al. 1995; Sawa et al. 1998). The PHB SNP is located in the 3′ untranslated region (UTR) and inactivates a novel regulatory RNA that functions as a tumor suppressor (Manjeshwar et al. 2003). The intronic variant in the VDR gene is the only polymorphism examined whose direct influence on physiology is not apparent. It is part of a VDR haplotype associated with alternative VDR signaling and activity as related to osteoporosis and breast cancer susceptibility (Bell et al. 2001; Curran et al. 1999; Ferrari et al. 1995; Friedrich et al. 2003; Hou et al. 2002). Thus, this ApaI polymorphism is used as a marker for the more extensive VDR haplotype.

Table 1 Gene polymorphisms examined for association with breast cancer risk

The association of breast cancer risk with these SNPs singly and in combinations of two and three genes was evaluated. We also investigated whether there was evidence for epistasis, i.e., whether the breast cancer risk associated with the combination of SNPs differed from that predicted by the risks associated with the individual SNPs. The demonstration of epistasis is important for building an improved predictive model for breast cancer risk.

Materials and methods

Study participants

The current subjects were drawn from a study of 6,151 women, 1,716 with breast cancer and 4,435 controls. The participants included here were 2,135 Caucasian women representing 631 cases of breast cancer diagnosed before 53 years of age and 1,504 cancer-free controls interviewed before 53 years of age. All were enrolled in the Oklahoma area from 1996 to 2003. The threshold at age 53 years was chosen for this analysis because it was the average age of menopause in our sample, and because it was the approximate age at which allele frequencies were observed to differ between young and older women for several SNPs. Cases were defined as women with a self-reported diagnosis of breast cancer and were identified primarily from mammography centers and the offices of oncologists in the Oklahoma area. Controls were women who had never been diagnosed with any cancer and were identified primarily from mammography facilities in the Oklahoma area, from Komen Race for the Cure fund-raising events, and from several local medical clinics. To our knowledge, none were related. Informed consent was obtained from all study participants. Participants were assigned anonymous ID codes that were the sole identification used for questionnaires and biological samples. They completed a questionnaire concerning personal medical/health history and family history of cancer. The Institutional Review Boards of the Oklahoma Medical Research Foundation and the University of Oklahoma Health Sciences Center approved all grant-funded study protocols; InterGenetics study protocols were approved by the Research Consultants Review Committee (Austin, TX, USA).

Candidate polymorphisms

SNPs known or predicted to alter functional activity in ten candidate genes with a role in major pathways (cell cycle, carcinogen metabolism, DNA repair, steroid hormone metabolism, and signaling) that influence cancer development have been examined (Table 1). The exception is the polymorphism of unknown function located in intron 8 of VDR that is a haplotype marker previously associated with altered activity. The rest of these SNPs have been directly associated with enzymatic and/or physiological alterations and, thus, are not likely to be simply markers in linkage disequilibrium with the causative polymorphisms. All of these SNPs have been associated with breast cancer risk in at least one published study. For this reason, in this initial investigation of candidate genes with significant functional consequences, we have examined only one SNP in each gene as opposed to performing more extensive haplotype analyses of many SNPs in these candidate genes.

Genotyping assays

Genomic DNA was isolated from either buccal cells collected in Scope mouthwash or from venous blood by using the Gentra PureGene DNA extraction kit (Gentra, Minneapolis, MN, USA). The polymerase chain reactions (PCRs) were performed in either an Eppendorf Mastercycler or Perkin Elmer 9600 thermal cycler using either HotStarTaq DNA polymerase (QIAGEN, Valencia, CA, USA) or Ex-Taq DNA polymerase (PanVera, Madison, WI, USA). Annealing and extension temperatures were optimized for each primer set. The genotypes were determined by Luminex-based microbead methods (Iannone et al. 2000; Taylor et al. 2001) and/or PCR-restriction fragment length polymorphism (PCR-RFLP) assays. The primer sequences and specific genotyping conditions are available from the authors upon request. Genotypes were determined blinded to the case-control status of the participant.

When this project was begun, all genotyping was performed by RFLP analysis. Two investigators independently determined all genotype calls. Discrepancies were arbitrated by repeating the assays. More recently, genotyping was performed by ASPE by using the Luminex technology platform and scored on strict information criteria (available from the authors upon request). For quality assurance, 5%–10% of individuals were genotyped twice or more by one or both methods. Both genotyping methods were highly reproducible and concordant. Excluding the individuals who were only genotyped for PHB, 97% of controls and 95% of cases were successfully genotyped for at least nine of the ten polymorphisms examined.

Prior to the association analyses, compliance with Hardy–Weinberg (HW) frequency expectations was determined in the controls. Deviation from HW expectations could indicate technological errors. Chi-squared goodness-of-fit tests were used to determine the significance of deviations from HW expectation frequencies, calculated from allele frequencies estimated by gene-counting methods.

Statistical analyses

In this report, an “oligogenic combination” refers to a combination of two or more genes, such as PHB:CYP17:COMT. An “oligogenotype” refers to the set of specific SNP genotypes in the oligogenic combination, e.g., PHB C/C:CYP17 C/T:COMT G/A.

The principal statistic used in the analysis was the OR calculated for those carrying the oligogenotype (or genotype if only one gene is involved), namely the exposed at-risk group, versus those not carrying the oligogenotype, that is, the unexposed baseline group. This ratio approximated the OR for the risk of the oligogenotype compared with the population average risk. Dominance effects were also considered by examining combined genotypes such as PHB T/* that includes both T/T and T/C genotypes.

The association analyses proceeded in three steps. First, the calculated OR for an oligogenotype was compared with a null distribution of ORs. This null distribution was generated by randomizing the case-control status of the individuals in the study and by calculating the OR for this oligogenotype in each of the 10,000 randomized samples. An empirical estimate for the P-value for the observed OR was given by the proportion of ORs in the null distribution that were equal to or more extreme (further from 1.0) than the observed OR. For most of the oligogenotypes, this empirical P-value did not differ appreciably from the theoretical P-value calculated assuming large number theory. However, differences did occur for some of the less common oligogenotypes.

Second, resampling was performed to give an empirical estimate of the 95% confidence interval (95% CI) and a more reliable estimate of the OR for the oligogenotype, particularly for the less common oligogenotypes. The study sample was resampled 10,000 times with each resampling composed of 80% of the controls and 80% of the cases selected at random. The 95% CI for the OR of the oligogenotype was then defined as the 5th and 95th centile points of the distribution of ORs calculated from these resamplings. The mean of this distribution of ORs provided a more stable estimate of the true OR for an oligogenotype, especially for the rarer oligogenotypes for which a single estimate could be influenced markedly by a small random fluctuation. All ORs and 95% CIs reported in the text and in the tables are these resampled values.

Third, the frequencies of the oligogenotypes in the cases were tested for significant deviations from HW expectations. Deviation from HW expectations was further evidence of association with breast cancer risk. Oligogenotypes were selected as having significant influence on variation in breast cancer risk on the basis of: (criterion 1) an empirical P-value <0.0001; or (criterion 2) a P-value <0.05 for the test of deviation from HW equilibrium in the case population and an empirical P-value <0.002 (combined P=0.05×0.002=0.0001). For those oligogenic combinations for which at least one oligogenotype met the significance criteria, an overall significance for association of the oligogenic combination with breast cancer was calculated by using logistic regression and GLIM statistical software (Numerical Algorithms Group, Downers Grove, IL, USA). Estimates of minimum detectable ORs with 80% statistical power were calculated by using NCSS-PASS 2000 software (NCSS Statistical Software, Kaysville, UT, USA).

Analysis of epistasis between SNPs also used a resampling approach. The study sample was resampled 10,000 times with each resampling composed of 80% of the controls and 80% of the cases selected at random. For each resampling, the expected OR under the null hypothesis of no epistasis was calculated based on the product of the genotype frequencies for the component SNPs in the oligogenic combination. If there were no epistasis, one would expect that the risks estimated from the product of the frequencies of the individual SNP genotypes would approximate the observed risk of an oligogenotype composed of those SNP genotypes. An empirical estimate for the P-value for the observed interaction is given by the proportion of ORs in this null (no epistasis) distribution that are equal to or more extreme (further from 1.0) than the observed OR.

Results

Characteristics of study population

Selected demographic and risk factor data for the 631 cases and 1,504 controls are presented in Table 2. Other than family history of breast cancer in the controls, the proportions with established breast cancer risk factors were similar to those reported in the literature for both cancer cases and controls (Bernstein et al. 2002; Davis et al. 2002; Wrensch et al. 2003). Our population-based controls had a higher prevalence of history of breast cancer in a first-degree relative (28%) than the 3%–20% range in published reports (Bernstein et al. 2002; Colditz et al. 1993; Davis et al. 2002; Hall et al. 1993; Johnson et al. 1995; Slattery and Kerber 1993; Wrensch et al. 2003). This might be expected given that almost 18% of the controls were recruited from the Komen Race for the Cure and were probably motivated to participate in the present study because they had an affected relative and were concerned about their own breast cancer risk; indeed, 34% of the controls recruited from the Komen Race for the Cure had a history of breast cancer in a first-degree relative. However, even for those controls recruited from women visiting mammography clinics for routine checkups, the prevalence was higher (27%) than that reported in other studies. This high percentage of controls with a family history might be expected to attenuate the magnitude of the associations with breast cancer risk, leading to a more conservative analysis. On the other hand, there were no significant differences (P-values ranging from 0.09 to 0.92) in SNP allele frequencies between controls with a history of breast cancer in a first-degree relative and those without.

Table 2 Selected demographic and risk factor data for the 631 cases and 1,504 controls. Family history was defined as an occurrence in any 1st, 2nd, or 3rd degree relatives (BBD  benign breast disease—self report of a previous biopsy and diagnosis of BBD, Menarche before age 13 years early menarche is a risk factor, Age at first birth age at birth of the first child, First birth after age 30 years late childbirth is a risk factor, Nulliparous no children; non-full-term pregnancies were not indicated; this includes young women who have not yet started to bear children, Nulliparous past age 40 years having no children is a risk factor, 40 years of age is past most likely period of childbirth, NS not significant). Totals vary because of incomplete data.

There were no significant differences between cases and controls with respect to family history of breast or other cancers. Earlier age at menarche and later age at the first birth showed the expected trends but were only marginally associated with increased risk. A personal history of benign breast disease (BBD) showed a strong association with increased risk (OR=1.9; 31% of breast cancer cases having a history of BBD compared with 19% of controls). Remaining nulliparous by 40 years of age was not associated with increased risk, contrary to expectations. Age at interview for the controls ranged from 18 to 52 years; age of diagnosis for the cases ranged from 15 to 52 years.

DNA samples were available for all 2,135 participants. However, not all women were successfully genotyped for all ten loci. Thirty-nine cases and 257 controls were typed for PHB only. For the remaining 592 cases and 1,247 controls, on average, 566 (96%) cases (range 553–580) and 1,211 (97%) controls (range 1,181–1,236) were genotyped for nine of the SNPs examined in this study. For the SNP in the 3′ UTR of PHB, 621 cases and 1,499 controls were genotyped. For the oligogenotypes involving two or three SNPs, the average number of complete oligogenotypes available for cases was 539 (range 522–574) and for controls was 1,168 (range 1,094–1,233). For all ten SNPs, 479 cases and 1,057 controls were genotyped. For the controls, none of the frequencies of individual SNPs showed significant deviation from HW expectations.

Oligogenic combinations associated with breast cancer risk

Altogether, 16,126 oligogenotypes from the 165 possible two- and three-gene oligogenic combinations were evaluated. After elimination of the 1,499 oligogenotypes with four or fewer individuals in either the cases or controls, 14,627 oligogenotypes were tested for significant association with breast cancer. There were 69 oligogenotypes, representing 37 distinct oligogenic combinations, that met our significance criteria (Table 3 Footnote 1). Thirty-four of these had at least one oligogenotype that met criteria 1, the more stringent of the criteria. The remaining three only had oligogenotypes that met criteria 2, which included consideration of deviation from HW equilibrium. Using a significance level of P≤0.0001, only 1.5 significant results (16,126−1,499=14,627×0.0001≈1.5) were expected by chance alone. Thus, despite testing multiple hypotheses, we observed far more oligogenotypes significantly associated with breast cancer risk than were expected by mere chance.

Table 3 Twenty-six of 69 oligogenotypes significantly associated with variation in breast cancer risk; the complete table with all 69 significant oligogenotypes is available at http://www.intergenetics.com/intergenetics/publications.html. Except for GSTP1:SULT1A1:ERCC2 and PHB:CYP17:COMT that are referenced in the text, only three-gene oligogenotypes that have OR>2.5 or OR<0.7 are shown; oligogenic combinations are ordered by decreasing maximum OR. OR Mean odds ratio determined by reampling, CI confidence interval determined by resampling, P (assoc.) empirical P-value for association with breast cancer risk, P (H–W) P-value for test for Hardy–Weinberg equilibrium in cases, Criteria met criterion 1 is P (assoc.) ≤0.0001, criterion 2 is P (H–W) ≤0.05 and P (assoc.) ≤0.002, OR (no epistasis) OR calculated under assumption of no epistasis, P (epistasis) empirical P-value for evidence of epistasis, Ratio OR/OR (no epistasis). Asterisk in gene combinations denotes both alleles, e.g., COMT */G is the combination of COMT G/G and COMT C/G.

Approximately 3/4 of the oligogenic combinations in Table 3 (and in the larger dataset) showed an OR>1. This might encourage one to conclude that the majority of combinations lead to increased rather than reduced breast cancer risk. However, consideration of the significant oligogenotypes comprising each combination showed ORs that stratified risk over a broad range. For example, the oligogenic combination GSTP1:SULT1A1:ERCC2 had six oligogenotypes that met our significance criteria, with ORs ranging from 0.7 to 2.6, indicating reduced to increased risk. The full range of ORs across all possible oligogenotypes for this oligogenic combination, including those that did not reach our criteria for statistical significance, was 0.6–3.6 (Table 5). Clearly, focusing only upon the oligogenotypes in Table 3 ignores much of the power of an oligogenic combination to stratify risk. To illustrate this point further, Table 3 shows that the oligogenic combination PHB:CYP17:COMT carries only a moderate (but significant) risk (OR=1.6). In contrast, Table 4, which shows all oligogenotypes for this oligogenic combination, demonstrates a range of ORs from 0.2 to 8.9. To allow comparison with the ORs calculated in traditional single-gene analyses, Table 4 also shows the ORs (OR B ) calculated by using a designated no-risk (OR=1.0) baseline genotype, typically selected as the most common genotype in the control population (in this case PHB C/C:CYP17 C/T:COMT G/A). For BRCA1/2, the range of OR B is 9.9–33.0 (Antoniou et al. 2003), and so the maximum OR B of 10.8 in Table 4 suggests that oligogenic combinations involving as few as three genes can generate risks in the range reported for mutations in BRCA1/2. The full range of ORs observed for each of the oligogenic combinations listed in Table 3 is shown in Table 5. In addition, P-values for association of the oligogenic combination, across all oligogenotypes, with breast cancer are presented in Table 5.

Table 4 Odds-ratios (OR) for the association between oligo-genotypes at the oligogenic combination PHB:CYP17:COMT and breast cancer risk, sorted by decreasing OR. OR Odds ratio calculated as risk vs. non-risk (all other genotypes), OR B odds ratio calculated as risk vs. baseline (most common) genotype. The OR is if no. controls and/or no. cases is zero
Table 5 Range of ORs for oligogenic combinations significantly associated with variation in breast cancer risk, ordered by decreasing range. Range of ORs Minimum to maximum OR across all oligogenotypes, where ORs are mean ORs determined by resampling, No. P (assoc.) number of oligogenotypes that show significant association with breast cancer at the specified thresholds, P (total) the significance of the association with breast cancer across the whole oligogenic combination, No. P (epistasis) ≤.0001 (%) number of oligogenotypes at the oligogenic combination that show significant epistasis (P≤0.0001)

Epistasis between SNPs

Table 3 compares the observed OR with the OR expected if each of the individual SNP genotypes confers breast cancer risk independently. The ratio between the observed and expected OR is given in the last column of Table 3 and demonstrates the level of epistasis. The empirical P-values calculated to measure the significance are also shown in Table 3. Of the 26 combinations presented in Table 3, 16 show evidence of epistasis at a P≤0.0001 level of significance. Overall, 27 (39%) of the 69 significant oligogenotypes exhibit evidence of epistasis at a P≤0.0001 level of significance, and 39 (57%) reveal evidence of epistasis at a P≤0.01 level of significance. For each of the significant oligogenic combinations, the number of oligogenotypes with empirical P-values ≤0.0001 is shown in Table 5. On average, 17% of oligogenotypes encompassed by the three-gene combinations shown in Table 5 exhibit significant evidence of epistasis. For comparison, 14% of all 14,627 oligogenotypes show significant evidence for epistasis (P≤0.0001). In total, these results indicate that risks associated with oligogenotypes cannot always be predicted from the risks associated with their component SNP genotypes.

Risk associated with single genes

To provide a comparison with other published studies, the ORs and associated empirical P-values for each gene considered alone are shown in Table 6 Footnote 2. None met our stringent significance criteria (P≤0.0001). However, PHB (T/*, OR=1.4, P=0.0003), ERCC2 (C/C, OR=1.6, P=0.0009), and COMT (G/*, OR=1.4, P=0.0041) meet the significance criteria used in most published studies. No others meet even the standard (P<0.05) level of significance. Table 6 includes the OR (OR B ) calculated when the most common homozygous genotype is selected as the baseline no-risk genotype (OR=1.0). Some of the results reported here cannot be directly compared with those in the literature because of differences in the characteristics of the study populations. Thus, our earlier published result for PHB (T/*, OR=4.8, P=0.003; Jupe et al. 2001) is not comparable because that result was based on subjects aged 50 years or younger and reporting a first-degree relative with breast cancer, unlike the OR reported here.

Table 6 Association of single gene polymorphisms with breast cancer risk in women under age 53 years. Only results for genes with significant association with variation in breast cancer risk are shown. A complete table with the results for all ten genes (PHB, HER2, CCND1, COMT, SULT1A1, NQO1, GSTP1, ERCC2, VDR, CYP17, COMT) is available at http://www.intergenetics.com/intergenetics/publications.html. ORs and CI were determined by resampling.

Discussion

The three main conclusions of this study are that oligogenic combinations are strongly and significantly associated with wide variation in breast cancer risk, that there is epistasis between genes in many of these oligogenotypes, and that these oligogenic combinations can stratify risk over a broader range than their component single genes considered independently.

Many oligogenotypes had significant and strong associations with variation in breast cancer risk, with ORs comparable with those seen for inherited mutations in BRCA1/2, even at the stringent level of significance (P≤0.0001) necessitated by our multiple hypothesis testing. We considered all possible combinations drawn two and three at a time from a set of ten SNPs in genes involved in cell cycle control, carcinogen metabolism, DNA repair, and steroid hormone metabolism and signaling. We also tested all possible oligogenotypes at these combinations with no a priori designation of a baseline no-risk oligogenotype. After eliminating from consideration those oligogenotypes with four or fewer individuals in either cases or controls, we evaluated 14,627 ORs. Of these, 69 were significant at a level of P≤0.0001. One would have expected perhaps at most two of the tested genotypes to be significant at this level of stringency just by chance, if no real associations existed. Since more than 30 times as many significant associations were observed than would be expected by random chance, we concluded that the majority of these significant oligogenotypes were true positives.

Testing multiple hypotheses can lead to high rates of false-positive findings. To avoid this problem, we chose a P-value ≤0.0001 as our criterion for accepting an association. This stringency risks the rejection of associations that might be true. The minimum ORs that can be detected with 80% power with our sample of 631 cases and 1,504 controls are shown in Fig. 1 for a range of oligogenotype frequencies. As indicated in Fig. 1, approximately 50% of the oligogenotypes had a frequency in the case sample of 0.08 or more for which an OR as low as 2.0 could be detected as significant at α=0.0001. The caveat is that the power calculations presented in Fig. 1 are based on large sample assumptions. However, Fig. 1 indicates that our study maintained reasonable statistical power despite the stringent level of significance. Thus, we have been able to show that two- and three-gene combinations can stratify breast cancer risk over a broad range.

Fig. 1
figure 1

Minimum detectable odds ratios (OR) with 80% power for 631 cases and 1,504 controls, and an α = 0.0001 level of significance for oligogenotype frequencies ranging from 0.01 to 0.50. Superimposed on this graph is a cumulative percent curve showing the percentage of oligogenotypes that have a frequency as large as or larger than indicated on the x-axis. This curve is cumulative from frequency 1.0 to 0.0. To show the way in which these curves may be interpreted, a line has been drawn upward from an oligogenotype frequency of 0.08. This intersects the minimum detectable OR curve at 2.0 and the cumulative percent curve at 50% indicating that, at a frequency of 0.08, an OR of 2.0 or larger can be detected as significant, and that 50% of the 16,126 oligogenotypes in this study have a frequency of 0.08 or larger.

Throughout, we have used empirical P-values to test oligogenotypes for significance to avoid making large-sample assumptions. However, examination of the theoretical P-values calculated based on large-sample assumptions for each oligogenotype shows that 60 of the 69 oligogenotypes listed in Table 3 are also significant at P≤0.0001, with none of the remaining nine combinations having a theoretical P-value greater than 0.0005 (data not shown). If we had chosen to use observed theoretical P-values as our measure of significance instead of empirically determined P-values, 71 combinations would have met our significance criteria (data not shown). All of the 11 (=71–60) oligogenotypes not meeting criteria by using empirical P-values identify an oligogenic combination listed in Table 5. Therefore, although theoretical and empirical P-values do not return exactly the same significant oligogenotypes, they identify the same oligogenic combinations.

The principal statistic used was the OR calculated for those carrying a particular oligogenotype, viz., the exposed at-risk group, versus those not carrying the oligogenotype, viz., the unexposed baseline group. This ratio approximated the OR for the risk of the oligogenotype compared with the population average risk. We chose these groups rather than designating a baseline no-risk oligogenotype to provide a risk estimate more applicable to genetic counseling. ORs based on a designated no-risk oligogenotype are perhaps most useful for an understanding of the biological basis for the altered risk, giving a more direct comparison between oligogenotypes. However, a difficulty arises in that the a priori choice of a baseline oligogenotype is not obvious.

The ORs reported here have not been adjusted for other risk factors such as age, age at onset of menarche, or age at birth of the first child. These factors have been omitted from the current analyses for two reasons. First, to some degree, limiting the study population to women under the age of 53 years, a largely pre-menopausal group, reduces the variation in risk attributable to age. We also note that, in Table 2, there is little difference between cases and controls in most of these other factors. Second, these factors have been omitted for computational and inferential simplicity. The specific consequences of this choice, although thought to be slight, are unpredictable. However, further analyses of larger data samples are needed to examine this.

Some of the associations in Table 3 may reflect survivorship rather than breast cancer risk. To examine this issue, we stratified the cases by those enrolled ≤24 months after diagnosis and those enrolled >24 months after diagnosis. Of the cases, 38% fell into the ≤24 months category; the lag time between diagnosis and enrollment ranged from 0 to 37 years (median 3.1 years). Analyses were repeated for each category. None of the 69 oligogenotypes showed appreciable differences in ORs between those interviewed ≤24 months and those interviewed >24 months after diagnosis (data not shown). These results suggest that the significant associations are not an artifact of survivorship.

For cases, the reference age was their age at diagnosis with breast cancer; for controls, the reference age was their age at recruitment. The present analyses included only women under 53 years of age. This stratification was chosen for its relationship to age of menopause, and because of the age-related changes in allele frequencies that we saw in preliminary tabulations by age. In the controls, the allele frequencies for several of these SNPs (VDR, CCND1, and CYP17) differed significantly between controls less than 45 years of age and those over 55 years, i.e., approximately the upper and lower age tertiles of the controls (data not shown). The reason for this is not known, although it may be attributable to competing causes of mortality, such as other cancers or heart disease, that are related to age and particularly to age of menopause. When we determined the age threshold that maximized the difference in SNP allele frequencies between young and older women, the average of these age thresholds across the SNPs was 52.3 years. Menopause is not an instantaneous event, and interventions such as surgical menopause and hormone replacement therapy confuse the issue further. The mean age of natural menopause, reported to be around 51–52 years (Bromberger et al. 1997; Gold et al. 2001; Kato et al. 1998), was 52.5 years of age in our sample. Thus, based on these values of 52.3 and 52.5 years, we chose an age stratification of under 53 years for our analysis.

To investigate the effect of our choice of age threshold, we reanalyzed our data using an age threshold of 50 years (under 51 years), the age most commonly used in the literature (Akhmedkhanov et al. 2000; Brandt et al. 2004; Breslow et al. 2001; Decensi et al. 2001; Rutter et al. 2003; Sidransky et al. 1992; Ursin et al. 2003; Wang-Gohrke et al. 2000; Weston et al. 1997). This analysis included 558 cases and 1,353 controls. The results were essentially unchanged by moving the age threshold. Of the oligogenotypes listed in Table 3, 46% also met our significance criteria in the under 51 years analysis (data not shown). Of the remainder, 64% and 81% had P≤0.0005 and 0.001, respectively, in the under 51 years analysis, and all had P≤0.01. This rise in P-value is not entirely unexpected given the decreasing sample sizes with the younger age threshold. Two oligogenotypes that did not appear in Table 3 had P≤0.0001 in the under 51 years analysis; both of these had P≤0.0002 in the under 53 years analysis. These two additional oligogenotypes did not add further oligogenic combinations to those previously seen in Table 5. We also considered an older threshold of 54 years, with similarly small differences in the results. Thus, the choice of age threshold, at least from under 51 years to under 54 years, did not affect the results appreciably.

The second observation from this study is that ORs associated with significant oligogenotypes frequently deviate significantly from the ORs that would be expected if their component SNP genotypes acted independently to affect risk. This deviation from independent interaction is often termed epistasis. Overall, 27 (39%) of the 69 significant oligogenotypes in Table 3 showed significant interaction at a P≤0.0001 level of significance (16 are shown in Table 3). Among all 14,627 oligogenotypes, 2,506 (17%) showed significant evidence of epistasis (P≤0.0001). These findings suggest that epistasis plays an important role in determining the breast cancer risk associated with these oligogenic combinations. However, these are statistical results, and eventually these oligogenic combinations will have to be examined to ensure that they make sense biologically.

Although the genome-wide prevalence of epistasis is not known, 17% is probably an overestimate. All of these genes and their polymorphisms have been shown to be associated with breast cancer risk in at least one other study. Because of this shared association, they might be more likely to show epistasis. On the other hand, this estimate may be an underestimate, since only multiplicative interactions are modeled. Biological interaction can yield either a multiplicative or an additive (albeit non-epistatic) statistical model (Cordell 2002), and for simplicity, the latter are not considered. However, in either case, epistasis is still clearly shown between these oligogenes.

The third observation is that oligogenic combinations significantly associated with breast cancer risk can be used to stratify risk further when all oligogenotypes within those gene combinations are considered. This is true even if the other oligogenotypes do not reach the threshold of significance. Thus, all oligogenotypes within a significant three-gene combination can be used to stratify risk over a broad range, as shown for PHB:CYP17:COMT in Table 4 in which the ORs range from 0.2 to 8.9. Risk stratification ranges for the 37 significant non-redundant oligogenic combinations are shown in Table 5. For the two-gene combination COMT:ERCC2, the range is fairly narrow varying from an OR of 0.8–1.9. Division of the largest OR by the lowest OR in this two-gene group shows that the oligogenotypes can stratify risk over a 2.5-fold range; the two-gene combination PHB:COMT returns ORs ranging from 0.3 to 1.8 representing a six-fold variation in risk. In comparison, the significant three-gene combinations return even wider variation in risk ranging from a modest four-fold for COMT:ERCC2:NQO1 to a 40-fold variation in risk for the combination PHB:COMT:SULT1A1. Thus, each oligogenic combination with at least one significant oligogenotype can stratify risk over a much broader range than that determined by considering the genes separately.

Confirmation of our results in additional large case-control studies and the addition of more SNPs to the epistatic analysis could lead to the development of a clinical genetic model to improve risk assessment for non-familial breast cancer. These combinations of low penetrance SNPs may also have relevance to breast cancers with a strongly familial basis. Highly penetrant mutations in the genes for breast cancer (BRCA1, BRCA2) do not always cause breast cancer. This lack of complete penetrance may be attributable in part to the same type of oligogenic interactions that we have described, in which breast cancer predisposition from BRCA1 and BRCA2 mutations is strongly modified by combinations of other low penetrance genes. Improved clinical risk assessment to identify those women at highest risk, regardless of family history, would assist in directing the use of mammography, sonography, and breast magnetic resonance imaging, perhaps leading to the earlier identification of breast cancer (Smith et al. 2003). As additional chemopreventatives are developed for breast cancer, a preventive care model can be envisioned that identifies high risk prior to diagnosis, that treats the patient prophylactically, and that avoids the disease directly.

Like breast cancer, most cancers are probably complex diseases with genetic components, and the approach of examining multiple genetic polymorphisms as performed here might also accurately assess predisposition for most other types of cancer. Moreoever, cardiovascular disease and diabetes probably have oligogenic influences predictable by this type of model (Yang et al. 2003). The etiology of breast cancer risk has always been alluded to as being complex, particularly for apparently “non-hereditary” breast cancer. However, our results suggest that, although the genetic etiology is complicated, many significant oligogenic associations can be identified when relatively large sample sets are analyzed.