Introduction

Genome-wide association studies (GWAS) and their follow-up studies have identified numerous single nucleotide polymorphisms (SNPs) with unknown biologic significance to be associated with breast cancer risk [17]. Researchers have begun to create composite genetic risk scores to investigate the polygenic manner of the genetic variants [811]. In a study by Reeves et al. [9], women in the highest quintile of a polygenic score that incorporated seven breast cancer susceptibility loci experienced a 40 % (95 % CI = 31–48 %) increased odds of breast cancer when compared to the middle quintile reference group. The association between the polygenic risk score and breast cancer risk was attenuated in participants diagnosed with estrogen receptor-negative tumors as opposed to estrogen receptor-positive tumors, providing evidence that some breast cancer susceptibility loci may be more strongly related to hormonally motivated tumors [9, 12].

Researchers hypothesize that certain genetic variants, in conjunction with reproductive and menstrual factors, are involved in hormonal pathways to influence breast cancer risk. Several studies have examined effect modification of reproductive and menstrual factors by GWAS-identified susceptibility loci and have found mainly null associations, [6, 13, 14] however, a number of modest interactions have been reported between reproductive and menstrual risk factors and breast cancer susceptibility loci, most notably with parity, age at menarche, and age at natural menopause [6, 13, 1517]. For instance, an interaction between age at menarche and an established breast cancer susceptibility SNP, rs13387042 (2q35), was recently described in the literature. Women with older ages at menarche (≥13 years) had an attenuated 12 % per-allele increased breast cancer risk compared to women with younger ages at menarche (22 % per-allele increase; interaction p-value = 0.04) [13]. Another study found the association between rs3817198 (LSP1) and breast cancer risk was stronger in women with more live births. The authors found a 4 % increased risk of breast cancer for every increase in the number of children and increase in the number of minor alleles [6]. A recent follow-up study which utilized data from more than 34,000 cases and 41,000 controls further confirmed this finding [17]. A separate study found established reproductive and menstrual factors and a polygenic score contributed independently to breast cancer risk [11]. The authors noted the association between their polygenic score and breast cancer risk did not differ by menopausal status of the participants. However, the researchers did not assess whether other reproductive and menstrual factors modified the association between the polygenic score and breast cancer risk [11]. Moreover, most previous studies have not systematically examined whether a composite risk score modifies the associations between established reproductive and menstrual factors with breast cancer risk. We investigated the associations between established breast cancer susceptibility loci with one another and reproductive and menstrual factors in association with breast cancer risk in a population-based case–control study.

Materials and methods

Study sample

Data were collected from the Three State Study, a previously described population-based breast cancer case–control study [1820]. Participants were selected from English-speaking females residing in Massachusetts (excluding metropolitan Boston), New Hampshire, and Wisconsin. Cases included in this analysis were women age 20–69 with an incident invasive breast cancer reported to each state’s cancer registry between 1995 and 2000. Community controls were randomly selected in each state from lists of licensed drivers (<age 65) and lists of Medicare beneficiaries (≥age 65). Controls were frequency-matched to approximate the age distribution of the cases within 5-year age strata. Participants gave informed consent during study enrollment. This study was conducted under the approval of the University of Wisconsin Health Sciences Institutional Review Board.

Risk factor information

Telephone interviews were used to obtain information on known and suspected risk factors for breast cancer including demographics, first degree family history of breast cancer, and hormonal exposures. Participant interviews were conducted on average 1 year after a specified reference date, which was defined as the date of cancer diagnosis for the cases. A comparable reference date for control participants was calculated based on their 5-year age strata and date of interview [20]. Among eligible participants, approximately 80 % of cases and 76 % of controls completed the interview.

DNA extraction and genotyping

Selected participants were asked to donate a buccal cell sample for genetic analyses using an oral rinse protocol. Participants interviewed between the years 2000 and 2001, who provided a buccal cell sample, are included in the present analysis. 70 % of these interviewed cases and 61 % of controls agreed to donate a buccal sample. Participants who chose not to provide a buccal cell sample were similar in age, percentage with a family history of breast cancer, and other established risk factors for breast cancer. To reduce the possibility of population stratification and maintain a study sample with ancestry similar to the GWAS and their follow-up studies, all analyses were limited to participants self-identified as White/Caucasian in race (95.1 % of participants). Samples were sent to a National Cancer Institute-affiliated laboratory for DNA extraction and storage conducted according to previously described protocols [18]. DNA was quantitated from frozen aliquots and plated for the genotyping assays. Significant results from previous studies were used to identify 13 candidate SNPs for the analysis: rs4973768 (SLC4A7), rs10941679 (5p12), rs2981582 (FGFR2), rs3817198 (LSP1), rs3803662 (16q12/LOC643714/TOX3) rs13281615 (8q24), rs11249433 (1p11), rs889312 (MAP3K1), rs2046210 (6q25), rs17468277 (ALS2CR12/CASP8), rs10483813 (RAD51B), rs13387042 (2q35), and rs6504950 (STXBP4) [16, 21, 22]. Genotyping was conducted using the Taqman nuclease assay (Taqman®) with reagents designed by Applied Biosystems (http://www.appliedbiosystems.com/) as Assays-by-Design™ and genotyping performed using the ABI PRISM 7900HT, 7700 or 7500 Sequence Detection Systems according to the manufacturer’s instructions. Quality control measures were taken to remove poor quality genotype data. SNPs missing >20 % of values or individual participants with a call rate <80 % for genotypic data were excluded from the analysis. All 13 SNPs passed quality control measures. 358 participants were removed from genetic analyses due to missing data for a total of 1,484 breast cancer cases and 1,307 community controls.

Covariate definitions

All reproductive and menstrual variables were first coded continuously and secondarily categorized into subgroups based on hypothetical biologic differences in risk. Menarche was categorized into tertiles to represent early, average, and late age at menarche (<12, 12–13, ≥14). Age at first birth was coded only among parous women and was coded according to frequently-used cutoffs (<20, 20–24, 25–29, ≥30). Parity was categorized as nulliparous and then in tertiles, among women that had ever been pregnant (1–2, 3, ≥4 live births). Participants were considered postmenopausal if they reported their menstrual cycles had stopped for at least the last 6 months prior to the reference date. The menopausal participants were categorized into two groups: participants with natural menopause or a second group defined as menopause due to other causes which included women whose menstrual periods stopped because they underwent bilateral oophorectomy, stopped using hormonal contraceptives, or had an unknown cause of menopause. Family history of breast cancer was defined as having at least one first degree relative (e.g., mother, sister, and daughter) with a breast cancer diagnosis.

Statistical analyses

Hardy–Weinberg equilibrium was tested among controls by using Chi squared tests to compare the observed to expected genotype frequencies. Odds ratios (OR) and 95 % confidence intervals (CI) were calculated using logistic regression to assess the association between each SNP and breast cancer risk under an additive genetic model with respect to the minor allele. All statistical models included a term for age and state of residence. Associations between established risk factors and breast cancer risk were also calculated. In order to evaluate the comparability of our point estimates to previous studies, we compared Three State Study breast cancer susceptibility loci point estimates to the estimates reported in the published GWAS or GWAS follow-up study by normal standardization. Statistical analysis was conducted in SAS software (Cary, NC 9.1).

Polygenic risk score

A composite risk score was created to assess the polygenic contribution of breast cancer susceptibility loci. All SNPs were coded as a count of the number of risk alleles. A stepwise selection procedure with a stay and entry criteria of 0.1 was used to identify SNPs most strongly associated with breast cancer risk (SAS software version 9.1). A weighted risk score was then formed as the sum of the number of risk allele copies of the selected SNPs multiplied by the corresponding log odds estimate. Nonlinearity of the score was assessed by the inclusion of a quadratic term, and interactions between the score and established reproductive and menstrual factors were tested by including a cross-product term in statistical models. In order to capture the polygenic risk score’s association with breast cancer risk when established reproductive and menstrual exposures were also considered, an additional model was analyzed by including the following variables selected a priori: age, state of residence, age at menarche, age at first full-term birth, parity, ever breastfeeding, and age at natural menopause. A third model was analyzed which included the aforementioned variables as well as a term for the presence of family history of breast cancer.

Reproductive and menstrual factor effect modification

Multivariate models were calculated to evaluate effect modification of the associations between reproductive and menstrual exposures with breast cancer risk by including cross-product terms combining the exposure of interest multiplied by the polygenic score. The reproductive and menstrual risk factors assessed in this study are as follows: age at menarche, age at first full-term birth, parity, ever breastfeeding, and age at natural menopause. To further elucidate how risks differ by combinations of genotypes and reproductive and menstrual patterns, stratified odds ratios were calculated for the associations between reproductive or menstrual factors and breast cancer risk stratified by the polygenic score.

Results

For no SNP, there was evidence for departure from Hardy–Weinberg Equilibrium(p-values > 0.05). We found no statistically significant differences in the magnitude of the association between the calculated Three State Study odds ratios and the odds ratios reported by the GWAS or follow-up studies for the association between the 13 loci and breast cancer risk (Table 1). We confirmed previously reported associations between seven breast cancer susceptibility loci and invasive breast cancer risk: rs13387042 (2q35), rs6504950 (STXBP4), rs4973768 (SLC4A7), rs10941679 (5p12), rs2981582 (FGFR2), rs3817198 (LSP1), and rs3803662 (TOX3). The range of estimated increase in breast cancer risk per increase in risk alleles was 11–22 % with SNP rs2981582 (FGFR2) showing the strongest association with breast cancer risk in this study; the minor allele was associated with a 22 % increase in breast cancer risk (95 % CI 8–38 %).

Table 1 Per-allele odds ratios and 95 % confidence intervals for the association between breast cancer susceptibility loci and breast cancer risk

Polygenic risk score results

A selection procedure identified seven of the 13 SNPs (rs13387042, rs4973768, rs10941679, rs2981582, rs3817198, rs3803662, and rs6504950) to include in the polygenic risk score. The range of the number of risk alleles present in the risk score was similar in cases and controls, although the distribution in cases skewed toward higher numbers of risk alleles; the range of risk alleles was 1–12 (mean = 5.86) in controls and 2–12 (mean = 6.35) in cases. Women in the highest quintile of the score had a 2.2-fold increased breast cancer risk when compared to women in the lowest quintile (95 % CI 1.67–2.88). Women in the third and fourth quintiles also had an increased risk (OR = 1.52, 95 % CI 1.15–2.02; OR = 1.50, 95 % CI 1.13–1.98, respectively) (Table 2). A quadratic polygenic risk score term was added to the statistical model to assess nonlinearity and was not statistically significant (p-value = 0.85). Polygenic risk score models adjusted for reproductive and menstrual exposures did not materially change the composite point estimate. Moreover, results were similar when an additional term for family history was added to the model (Table 2).

Table 2 Odds ratios and 95 % confidence intervals for the association between a polygenic risk score a and breast cancer risk

Effect modification results for individual SNPs, the polygenic score, and reproductive and menstrual factors

We conducted 21 pairwise interaction tests among the seven significant SNPs (rs13387042, rs4973768, rs10941679, rs2981582, rs3817198, rs3803662, and rs6504950), and did not observe strong evidence of interactions in their associations with breast cancer risk (19 p-values > 0.05). Potential effect modification of rs13387042 by rs4973798 (interaction p-value = 0.02) and rs10941679 by rs3803662 (interaction p-value = 0.03) was noted. We also evaluated whether breast cancer susceptibility loci modified the associations between reproductive and menstrual risk factors with breast cancer risk. Sample distributions of these hormonal exposures are located in Table 3. Cases were more likely to have a first degree family history of breast cancer, earlier age at menarche, fewer children, and later age at menopause than controls. Effect modification of the associations between reproductive or menstrual factors and breast cancer risk by the polygenic score were not observed (all interaction p values > 0.05) with the exception of age at natural menopause where there was a weak interaction detected (p value = 0.09, result not shown). The deleterious association between later age at natural menopause and breast cancer risk was more apparent in women with lower polygenic score values.

Table 3 Reproductive and menstrual characteristics by case status

Discussion

We genotyped 13 breast cancer susceptibility loci identified from previous genetic epidemiology studies to examine how these loci interact with each other, and reproductive and menstrual risk factors in association with breast cancer risk. Of the candidate SNPs, seven were confirmed for an association with breast cancer risk including rs2981582 in the fibroblast growth factor receptor 2 gene, the SNP most strongly associated with breast cancer risk in this population. SNP rs2981582 and the other six significant SNPs have also been confirmed as breast cancer loci in a number of study populations and ethnic subgroups [1, 6, 13, 23].

We examined the possibility that when multiple risk alleles are found in conjunction with one another their association with breast cancer risk may be non-additive. Our polygenic score indicated a linear association with increased breast cancer risk, and risk was more than doubled for women in the highest risk quintile compared to women in the lowest. We found no statistically significant differences between our odds ratio estimates and those previously reported in the literature. The comparability of the estimates supported using the point estimates from the current study to create our polygenic risk score. Previous studies have found corresponding magnitudes of association between polygenic risk scores and breast cancer risk. Harlid et al. created a polygenic score using ten GWAS-identified SNPs, four of which are included in the risk score from the current study [rs29815829 (10q26), rs3803662 (16q12/TOX3), rs3817198 (11p15/LSP1), and rs13387042 (2q35)]. In their study, the OR was 2.12 (95 % CI 1.80–2.50) for women with the maximum number of risk alleles compared to those with the lowest number of risk alleles [8]. Reeves et al. genotyped the same ten SNPs as the Harlid group and similarly found a twofold increase in risk when comparing the top and bottom quintiles of their polygenic score [9]. In the present study, we conclude that the increased breast cancer risk in women with larger risk scores was attributed to independent associations of the SNPs and not due to a synergistic increase in risk. Analogous to our results, Reeves et al. also found that the breast cancer susceptibility loci were independently associated with breast cancer risk [9].

Only recently have researchers explored the possibility that breast cancer susceptibility loci in combination with reproductive and menstrual factors may increase breast cancer risk. We found that the polygenic score’s association with breast cancer risk was separate from established reproductive and menstrual factors as the association between the polygenic risk score and breast cancer risk did not materially change when reproductive and menstrual factors simultaneously considered. A study of breast cancer risk within the Women’s Health Initiative Clinical Trial by Mealiffe et al. [10] assessed whether predictions of breast cancer risk could be improved by adding a polygenic score to the Gail risk model, which incorporates clinical and personal risk factor information, including age at menarche and age at first birth. Similar to our findings, Mealiffe et al. [10] found their polygenic score and the Gail model contributed separately to breast cancer risk. When their polygenic risk was added to statistical models which included the Gail model, the area under the curve increased from 0.557 to 0.594 (p-value <0.001).

This study has several strengths including comprehensive information on reproductive and menstrual factors obtained on a population-based sample with high participation rates. Previous studies have shown that women report reproductive and menstrual events with high accuracy [24] suggesting that our hormonal risk factor data should be reliability recorded. Additionally, previous investigators have not systematically evaluated whether a polygenic risk score modifies the associations between reproductive and menstrual factors with breast cancer risk, as we have done in this analysis. There are also certain considerations to be noted for this study’s interpretation. Only a subset of established breast cancer susceptibility loci were evaluated in this study, consequently, loci important to the polygenic portion of breast cancer risk have not been included in the risk score leaving part of the genetic component of breast cancer risk unidentified. It is possible that a polygenic score which includes a more comprehensive set of loci may have a stronger association with invasive breast cancer risk than the risk score calculated here. We did not have information on tumor receptor status, and were unable to stratify breast cancer cases by many of the tumor characteristics known to be influenced by hormones. In summary, women with a higher risk score for seven established breast cancer loci were at an increased breast cancer risk compared to women with a lower polygenic score. Evidence from this study suggests that these loci are independently associated with breast cancer risk. Our polygenic score did not materially affect the associations between reproductive and menstrual risk factors with breast cancer risk.