Introduction

The last two decades have seen a sharp increase in the rate of contralateral prophylactic mastectomy (CPM) in patients diagnosed with ductal carcinoma in situ (DCIS) or invasive breast cancer (BC) [16]. Paradoxically, the same periods over which the CPM rate has increased are also the periods over which the risk of contralateral BC (CBC) has decreased [7]. The fall in CBC risk is most likely due to increasing use of effective adjuvant therapies such as tamoxifen and aromatase inhibitors for treating primary BC [810], which also have a protective effect on the contralateral breast. Historically, the CBC rate has been quoted to be 0.5–0.75% per year, but these estimates are considered outdated with some recent population-based studies estimating it to be 0.1–0.3% per year for most women [1117]. In fact, for the majority of patients affected with first primary BC, their CBC risk is much smaller than the risk of recurrence from their index cancer [3].

Several risk factors for CBC have been identified in the literature such as BRCA1/2 mutations, young age at diagnosis of the first primary BC, family history of BC, lobular histology, negative estrogen receptor/progesterone receptor (ER/PR) status, positive lymph node status, and larger tumor size [13, 15, 1726]. On the other hand, protective factors for CBC include premenopausal oophorectomy, adjuvant hormonal therapy, chemotherapy, and CPM [12, 20, 25, 27, 28].

For typical BC patients though, the individual CBC risk is relatively low because the factors associated with increased risk are found only in small proportions of BC patients. For example, only a small percentage of BC patients are BRCA1/2 mutation carriers, and women without family history of BC are more prevalent than those with family history [15, 20]. Thus, evidently, the actual CBC risk does not drive the decision of whether to undergo CPM. In fact, the CPM rate has primarily increased in the group of women who are at low risk of CBC [17]. This apparently paradoxical relationship has led to a flurry of investigations into factors driving the decision of CPM including the perceptions about CBC risk [14, 16, 17]. Not surprisingly, it has been found that patients tend to perceive their CBC risk to be much higher than the actual risk, sometimes even by a factor of 5–10 [17, 29, 30]. To some women, CPM may seem as an opportunity to escape from the anxiety and stress associated with surveillance if they decide to keep their healthy breast [16, 17]. Recent advances in reconstruction techniques and considerations for shape and symmetry in both breasts also play a role in this decision [16, 17]. Nonetheless, these psychological and physical benefits must be weighed against the loss of a healthy breast in the light of actual risk of CBC and the risks, negative effects, and irreversible nature of CPM [3, 16, 17, 28, 3139]. Another factor to consider is that there is little to no convincing evidence that CPM prolongs either overall or BC-specific survival or reduces BC mortality in women with sporadic BC [17, 28, 3134].

Notwithstanding the above arguments, it is quite understandable for women traumatized by the diagnosis of unilateral BC to instinctively lean towards CPM if only as an anxiety-relieving tool. This is especially true if they perceive their CBC risk to be high. Thus, it is critical for physicians to review with their patients the various aspects of this complex decision. To aid physicians in this task, a readily available tool that can provide objective, quantitative, and personalized estimate of CBC risk is needed. An educated patient can be expected to make an informed decision, which may or may not be to undergo CPM.

In this study, we developed a statistical model for estimating individualized risk of CBC. In particular, we developed an absolute risk prediction model along the lines of the BC risk assessment tool, popularly known as Gail model [40], which is widely used for predicting risk of first primary BC in healthy women. Our model, named CBCRisk, is applicable for women diagnosed with first primary of unilateral invasive BC and/or DCIS.

Data sources and methods

Data sources

We used prospectively collected data from two sources: Breast Cancer Surveillance Consortium (BCSC) and Surveillance, Epidemiology, and End Results (SEER) [41, 42]. BCSC data were used to build the relative risk model and to compute the attributable risk, while SEER data were used for calculating age-specific composite CBC hazard rates and mortal hazard rates from non-CBC causes.

BCSC data were collected on women undergoing mammography at seven registries across the USA [43]. We started with a cohort of women who had their first BC diagnosed between ages 18–88 years and years 1995–2009. For each woman, only those BC diagnoses were considered that were of type invasive and/or DCIS. Women satisfying any of the following criteria were excluded: (a) their first BC diagnosis was made solely by mammography (i.e., no histological confirmation), (b) they underwent CPM, (c) their race was unknown or unclear, (d) they had undergone radiation therapy before the first BC diagnosis, (e) their first BC was bilateral or had unknown laterality, and (f) their second diagnosis (or a subsequent one if all diagnoses before it are of the same laterality as the first one) had unknown laterality.

After applying the above inclusion and exclusion criteria, there were 77,746 women in the BCSC cohort. We define a woman to be a case if her CBC diagnosis was made at least 6 months after the first BC diagnosis. Among 77,746 women, 1921 were CBC cases and the rest 75,825 were eligible to be controls. The follow-up for each woman in this cohort starts at the time of her first BC diagnosis. For a case, the follow-up ends at the time of her CBC diagnosis, whereas for a control, the follow-up ends at death or censoring in 2009, the BCSC cut-off year. We matched the eligible controls to cases to get a case–control dataset for building the relative risk model. Specifically, for each case, we randomly selected three women from among the eligible controls who matched the case on race (White-non-Hispanic, Black-non-Hispanic, Hispanic, Asian, or other) and year of first BC diagnosis, and whose length of follow-up was at least as long as that of the case. As current age and age at first BC diagnosis are important risk factors, we did not use them for matching criteria to allow the possibility of including them in the model. This resulted in a total of 5763 controls being selected for the case–control dataset.

SEER data include 18 registries from across the USA. We used data from years 1998 to 2013 and applied the same inclusion and exclusion criteria as for BCSC data except that prior radiation therapy information was not available in the SEER database. Finally, our SEER cohort included 824,768 women with 19,835 cases and 804,933 eligible controls.

The registries and the Statistical Coordinating Center of BCSC have institutional review board approval for either active or passive consenting processes or a waiver of consent to enroll participants, link data, and perform analytic studies. Their procedures are Health Insurance Portability and Accountability Act compliant. Besides, they have received a Federal Certificate of Confidentiality, and have instituted controls for protection for the identities of women, physicians, and facilities who are subjects of this research.

Model building strategy

We built CBCRisk following the steps similar to those used in developing the Gail model [40]. Specifically, there are four steps in the model building procedure: (1) identifying the risk factors for CBC and building the relative risk model, (2) estimating the baseline age-specific CBC hazard rates by combining the age-specific composite CBC hazard rates and attributable risk fractions, (3) computing mortal hazard rates from non-CBC causes, and (4) combining results from steps 1, 2, and 3 to get the projection of absolute risk of CBC.

In step 1, we investigated a large number of potential predictors. This includes current age (defined as the age at last follow-up), age at first BC diagnosis, menopausal status, tumor size, status of ER, PR human epidermal growth factor receptor 2 (HER2), number of positive lymph nodes, chemotherapy, biopsy status, first-degree family history of BC, personal or family history of ovarian cancer, high-risk pre-neoplasia status, breast density (obtained using the BI-RADS system), age at first birth, hormone replacement therapy, anti-estrogen therapy, stage of first BC, tumor node status, body mass index, and type of first BC (pure DCIS, pure invasive, or a mix of the two). Each of these variables was ascertained from the records available at the time of the first BC diagnosis. The variable high-risk pre-neoplasia status was defined as ‘yes’ if a patient had lobular carcinoma in situ in their first BC and/or had a history of atypical hyperplasia in any breast prior to their first BC, otherwise, the category was set to ‘no/unknown.’ The ‘no’ and ‘unknown’ categories were combined because the ‘no’ response provided by the pathology report may also include incomplete information about pre-neoplasia. To accommodate missing information, we allowed “unknown” as one of the categories in most variables. We also explored interactions between several risk factors; however, some interactions could not be considered because of the limited number of subjects available at the combinations of the risk factors involved. The final multivariate model was obtained by applying the standard variable selection methods [44]. More details about each of the four model building steps are provided in Supplementary Materials.

We used statistical software R [45] for all computations. We have developed an R package implementing CBCRisk and a freely available app, which are available at http://www.utdallas.edu/~swati.biswas/ and http://www.utdallas.edu/~pankaj/. The R package has also been integrated into CancerGene, a widely used cancer risk prediction tool, which will be available soon from the HughesRiskApps site (http://www.hughesriskapps.net/index.php).

Results

The final multivariate relative risk model includes eight risk factors—age at first BC diagnosis, hormone therapy, first-degree family history of BC, high-risk pre-neoplasia status, ER status, breast density, first BC type, and age at first birth. The estimates of relative risks and their 95% confidence intervals (CIs) along with the counts of cases and controls in each category of a risk factor are summarized in Table 1. Note that all variables allow an unknown (missing) category except the age at first diagnosis and first BC type. We find that younger age at the first diagnosis, family history of BC, and negative ER status are associated with higher risk of CBC, while anti-estrogen therapy reduces the risk. These findings are consistent with the literature [12, 20, 25, 27, 28]. A history of high-risk pre-neoplasia corresponds to an increased risk, although only a few women in BCSC data were known to have such a history. Interestingly, breast density has an increasing dose effect on CBC risk with increased density associated with increased risk. Similar increasing dose effect is observed for the presence of DCIS in the first BC type—invasive BC with associated DCIS (mixed invasive-DCIS) has increased CBC risk compared to pure invasive, and the risk goes further up for pure DCIS. Older age at first birth corresponds to a higher risk; in particular, the risk goes up substantially for 40+ group. However, we note that the sample sizes are small for this category (as well as for some categories of other variables) leading to wider CIs.

Table 1 Risk factors associated with CBC in the final relative risk model

In defining the categories for the age at first birth variable, we combined nulliparous and <30 age categories as their estimated relative risks were similar. For age at first BC diagnosis, we had also explored finer categories but did not find any improvement in the results. None of the interaction terms that we were able to consider were found to be significant. The final relative risk model does not have current age of a woman as a risk factor as it was highly correlated with the age at first BC diagnosis. Thus, both the factors could not be in the model together. We excluded current age and retained age at first BC diagnosis. As current age is not in the relative risk model, the relative risks are constant over time. Nonetheless, current age does play a role in the final absolute risk model through age-specific hazard rates.

Figure 1 shows the age-specific composite CBC incidence rates per 1000 person-years. It ranges between 2.6 and 4.4, with lowest value over the interval [18, 30) and highest over the interval [75, 80). It exhibits little fluctuation around 3.5 over the intervals [30, 65) and [85, 90), and around 4.1 over the interval [65, 85). Thus, on the whole, we may conclude that the CBC incidence rate increases from 2.6 over the interval [18, 30) to about 3.5 over the interval [30, 65) and further to about 4.1 over the interval [65, 85), and declines thereafter to 3.5.

Fig. 1
figure 1

Age-specific composite incidence rates of CBC estimated using SEER data. Age on the horizontal axis can be interpreted as age at counseling

Figure 2 shows the age-specific non-CBC mortal hazard rates per 1000 person-years. Starting at 35, the rate decreases steadily to about 15 around age 40, essentially stays there till around age 65, and increases steadily thereafter to 83 by age 90.

Fig. 2
figure 2

Age-specific mortal hazard rates from non-CBC causes estimated using SEER data. Age on the horizontal axis can be interpreted as age at counseling

In Table 2, we present 5- and 15-year projected absolute risks of CBC for a wide range of combinations of risk factors and current age. We can see clearly that the risk varies substantially depending on the specific combination of risk factors a woman has. The second last row represents a typical woman in BCSC data in terms of each risk factor, where by “typical”, we mean a woman with the most frequently occurring value for each factor. As the most common category for many factors is “unknown”, we also show in the last row a typical woman after excluding “unknown” category. We see that 5- and 15-year CBC risks are roughly 1.5 and 4.5%, respectively, for both women. This illustrates that the CBC risk for a typical BC patient is not high, contrary to the perceived notion.

Table 2 Examples of CBC risk prediction with 95% confidence interval in parenthesis using CBCRisk

Discussion

For a patient diagnosed with unilateral BC, the risk of CBC is a fundamental concern. Yet, there is no prediction model available for assessing the risk in a personalized and quantitative manner for the general population. The need for such a model is especially dire in the light of the paradoxical situation of increasing rates of CPM in the times of falling CBC rates and mounting evidence of little or no survival benefits of CPM. To fulfill this need, we have developed a model called CBCRisk. This model utilizes several relevant risk factors that we found to be significant. Our findings are consistent with the literature in that a family history of BC, a younger age of first BC diagnosis, and an ER negative status are associated with increased CBC risk, while anti-estrogen therapy has a protective effect [12, 20, 25, 27, 28, 46]. The effect of breast density has attracted substantial research interest in the recent years. Our study adds to the literature in this regard by finding that breast density has an increasing dose effect on CBC risk. DCIS has also drawn attention lately, and our result indicates that the presence of DCIS in the first BC, either in a mixed or pure form, also increases the chance of CBC.

We found that the CBC risk can vary greatly depending on the risk factors and thus using a one-size-fits-all estimate such as the commonly cited 0.5% CBC risk per year can be misleading for many patients. For typical BC patients though, the CBC risk does not appear to be high. Availability of a quantitative and individualized risk estimate as provided by CBCRisk can aid physicians in educating their patients effectively. This, in turn, will empower patients to make an informed decision, which may or may not be to undergo CPM.

We have also used the BCSC data for calculating the age-specific hazard rates (in step 2 of model building). These rates were similar to those we presented in Figs. 1 and 2 using SEER data except at the two extreme ends of age intervals for which the BCSC data lack sufficient sample sizes and hence are subject to greater variability. As SEER rates are based on a much larger dataset, we decided to use those rates in our risk calculations. However, SEER lacks information on several risk factors for CBC, which precluded its use for building the relative risk component of the model.

Our study has several limitations. One is missing/unknown information in several variables. As a certain amount of missing information is inevitable in most clinical situations, we allowed such a category for most of the variables so that our model can still be used. Nonetheless, the relative risks associated with the “unknown” categories must be interpreted with caution keeping in mind that an “unknown” category is actually a mixture of the other categories, and its definition/composition for a variable may not be stable across populations. Further, due to small sample sizes for some specific combinations of factors, we could not study all two-way interactions. Also, a few variables such as oophorectomy and family history of breast and ovarian cancers are self-reported.

Another limitation is that the BCSC data do not contain information on the type of anti-estrogen therapy. Thus, CBCRisk treats tamoxifen and aromatase inhibitors in the same way, even though the latter are known to be more protective for CBC [8, 10]. In future, we plan to update our model based on other data sources that have information on the type of anti-estrogen therapy.

The BCSC data lack information on some risk factors for CBC such as BRCA1/2 status and family history of BC beyond first-degree relatives. Even among the first-degree relatives, the BCSC data do not specify which specific relative has BC. Thus, CBCRisk uses limited family history information and is mainly intended for sporadic BC patients. For patients with a strong family history of BC or ovarian cancer and/or who are carrying BRCA1/2 mutations, the Mendelian genetic risk prediction model BRCAPRO [47] is preferable as it uses extensive family history information including ovarian cancer. However, BRCAPRO does not utilize several covariates that are used in CBCRisk. Thus, it will be worthwhile to pool the strengths of the two models in a joint model that uses covariates from CBCRisk as well as somewhat more detailed family history information. For this, it may be preferable to use simplified versions of BRCAPRO [48] to limit the burden of family history collection for the general population of BC patients. Finally, it will be of interest to validate CBCRisk with independent data. Another future work could be to evaluate the effectiveness of CBCRisk in educating patients.