Introduction

The rates of contralateral prophylactic mastectomy (CPM) among patients diagnosed with unilateral breast cancer (BC) have been increasing sharply over the last two decades [1,2,3,4]. However, most of these CPMs are believed to be medically unnecessary because the risk of contralateral breast cancer (CBC) has been, in fact, falling mostly due to increased use of effective adjuvant therapies such as tamoxifen and aromatase inhibitors for treating the first primary breast cancer [5]. While unilateral BC patients carrying BRCA mutation or having family history of BC indeed have a high risk of developing CBC, a majority of women have much lower risk of developing CBC. Even more perplexing is the fact that the rate of CPM has increased the most in a group of BC patients who are at the least risk of getting CBC [6].

These paradoxical observations have baffled the scientific community leading to a flurry of investigations to uncover the reasons driving the decision to undergo CPM. Although this decision-making process is complex with several factors playing roles in varying degrees for different women, research shows that many patients tend to overestimate their CBC risk, sometimes even by a factor of 5–10 [6,7,8]. Thus, it is natural for them to instinctively lean towards CPM if only as an anxiety-relieving tool. Although CPM does reduce the risk of CBC, there is no convincing evidence that it prolongs survival [6]. In fact, for majority of patients, their CBC risk is much smaller than the risk of recurrence from their index cancer [3]. Moreover, CPM also has a substantial number of side effects and can have an adverse effect on a woman’s physical and psychological health and well-being [9]. Thus, the decision-making process for CPM should weigh these facts along with the loss of a healthy breast in light of the actual risk of CBC.

These findings call for a pressing need to educate patients effectively on their CBC risk. For this task, physicians need a statistical model for risk prediction of CBC based on patient’s personal risk factors. We had earlier developed such a model named CBCRisk for predicting absolute risk of CBC for women diagnosed with invasive BC or ductal carcinoma in situ (DCIS) in one breast [10]. The model was built using US population-based datasets from Breast Cancer Surveillance Consortium (BCSC) [11] and Surveillance, Epidemiology, and End Results (SEER) [12]. It uses eight risk factors from the time of the first BC diagnosis, namely, age at first BC diagnosis, anti-estrogen therapy, family history of BC, high-risk pre-neoplasia status, breast density, estrogen receptor (ER) status, type of first BC, and age at first birth. The directions of association of these factors with the risk of CBC in CBCRisk are the following. Higher risk is associated with younger age at first BC diagnosis, no anti-estrogen therapy, presence of family history of BC, presence of high-risk pre-neoplasia, denser breast density, negative estrogen receptor (ER) status, presence of DCIS in first BC, and older age at first birth. Except for age at first diagnosis and first BC type, an unknown category is available for all other factors. This way the model can be applied even for women with missing values on some factors with the usual caveats of unknown category, e.g., it is heterogeneous with possibly different compositions across different cohorts. For a woman with cancer in one breast, CBCRisk takes information on these risk factors and her current age as inputs, and provides the absolute risk of her getting CBC within a specified number of years as output. CBCRisk has been implemented in an R software package as well as a web-based app [13].

Here we conduct a study to validate CBCRisk on independent clinical data from the Johns Hopkins University (JH) and the University of Texas MD Anderson Cancer Center (MDA). We primarily focus on absolute risk of developing CBC within three and five years of first BC diagnosis. We could not consider risk estimates for longer periods, e.g., 10 years, because only a small portion of the validation cohorts have been followed up that long.

Data sources

The validation data were from two hospital-based cohorts—JH and MDA. Women whose first BC diagnosis was histologically confirmed as either invasive and/or DCIS and whose age at first diagnosis was between 18 and 88 years were included in the cohorts because CBCRisk was developed specifically for these women [10]. The follow-up for each woman begins at the time of her first BC diagnosis. Women who underwent CPM during the follow-up period or who got CBC within 6 months of their first BC diagnosis were excluded. Same as the first primary diagnosis, we also require the CBC diagnosis to be histologically confirmed as either invasive and/or DCIS. A woman who develops CBC is called a case whereas a woman who does not is called a control. For a case, the follow-up ends at the time of her CBC diagnosis, whereas for a control, the follow-up ends at censoring or death.

The initial JH cohort consisted of 6447 women whose first BC was diagnosed between 1990 and 2016. After applying the exclusion criteria, 6035 women remained in the cohort consisting of 117 cases and 5918 controls. Of these, there were only 231 women, including 81 cases, who had all eight variables (used by CBCRisk) recorded. For the remaining women, only the age at first diagnosis, first BC type, high-risk pre-neoplasia status, anti-estrogen therapy, and ER status were available. For an unreported variable, the value was set to be unknown as allowed by CBCRisk for most variables. The initial MDA cohort had 6123 women who got first BC diagnosis between 1983 and 2014. After excluding women based on the exclusion criteria, 5185 women remained in the cohort, including 111 cases and 5074 controls. High-risk pre-neoplasia status and breast density were not available in these data, and thus were set as unknown in CBCRisk. Further, age at first birth was also missing for almost 94% of women in this cohort. Duration of follow-up was available for all women in the two cohorts.

The follow-up profiles of the two cohorts are somewhat different. Specifically, the mean and the median follow-up times in the JH cohort are, respectively, 6.4 and 6 years for cases and 8 and 7.4 years for controls, and about 8% of women are followed up for less than one year, whereas the mean and the median follow-up times in the MDA cohort are, respectively, 5.1 and 4.2 years for cases and 5.8 and 4.8 years for controls, and about 1.7% of women are followed up for less than 1 year.

For validation of risk prediction in a specific period, for example, 3 or 5 years after first BC diagnosis, we need the true disease (CBC) status for each woman at the end of that specific risk prediction period. Although this can be determined unambiguously for a woman who ends up developing CBC during her follow-up period (a case), for a control who died or was censored before the end of the risk prediction period, her true CBC status remains unknown. Therefore, such women need to be dropped for the validation in that specific risk prediction period. This means that for validation of 3 (or 5 year) risk, we consider only those controls who have been followed up for at least 3 years (or 5 years). This necessarily implies that the size of the validation cohort for 5-year risk is a subset of that for 3-year risk. Specifically, in the JH cohort, 5026 and 4142 controls have been followed up for at least 3 and 5 years, respectively. The corresponding numbers for the MDA cohort are 3547 and 2478 controls. Moreover, for a case, if her length of follow-up, i.e., the time between her first BC and CBC diagnosis, exceeds 3 years, she, in fact, serves as a control for 3-year risk validation and likewise for 5-year risk validation. Lastly, CBCRisk sets 89 years as the upper age limit for risk calculation (this allows 1-year risk prediction for a woman of 88 years, the oldest age for which CBCRisk can be used). Women whose age at first BC diagnosis is such that they would go over the 89-year limit in 3 years (5 years) are dropped from the cohort for the calculation of 3-year (5-year) risk.

Statistical methods

To validate CBCRisk, we evaluate its calibration and discrimination accuracy in predicting absolute risk of developing CBC over a specified period starting from the age at first BC diagnosis. Two risk prediction periods of 3 and 5 years are considered. We could not consider longer risk prediction periods such as 10 years as only few women have been followed up for that long (as is now clear from the previous section). We begin by computing the absolute risks for each women in the validation cohorts. We set the current age for each woman in CBCRisk as her age at first BC diagnosis as this is when most women are likely to seek risk assessment for CBC. The calibration is evaluated by comparing the expected number of CBC cases (E) as predicted by the model with the observed number of CBC cases (O) at the end of a risk predication period through E / O ratio for the entire cohort as well as for subgroups formed by quintiles of predicted risk and by various levels of the risk factors used in CBCRisk. For a given group of women, E is computed by summing the absolute risks of all women in that group. We also compute 95% confidence interval (CI) for E / O based on a normal approximation for natural logarithm of O given by \((E/O) \times \exp (\pm \, 1.96 \sqrt{1/O})\) [14].

To evaluate the discrimination ability, we calculate sensitivity (true positive rate) and specificity (true negative rate) values at various cutoffs and plot sensitivity vs one minus specificity as a receiver-operating characteristic (ROC) curve. The area under the ROC curve (AUC) is then used to summarize the overall discrimination ability of the model. This index ranges between 0 and 1, with 0.5 corresponding to a fair coin toss and 1 corresponding to perfect prediction. The AUC can be interpreted as the probability that the absolute risk of a randomly selected diseased woman is greater than the absolute risk of a randomly selected nondiseased woman. We also compute a 95% CI for AUC using bootstrap method [15, 16].

We used the statistical software package R for all the computations [17]. The CBCRisk package [13] has been used to calculate the absolute risks based on the CBCRisk model. The package pROC has been used to compute AUC and its 95% bootstrap CI and to draw the ROC curve [18].

Results

Table 1 presents the case–control distributions for the two cohorts together with those of BCSC and SEER data on which CBCRisk was built [10]. The proportion of cases in both the cohorts are smaller than those in BCSC and SEER. Specifically, JH and MDA, respectively, have 0.5 and 0.3% fewer cases (as percentages of respective total cohort sizes) than SEER. Table 2 presents the case–control distributions of JH and MDA cohorts by stratifying them on the eight risk factors used in CBCRisk. Clearly, the compositions of the two cohorts are quite different. Only the distributions according to age at first diagnosis are comparable. Although the distributions according to high-risk pre-neoplasia status, breast density, and age at first birth may also seem somewhat similar but this is primarily driven by the large number of women in the “unknown” category. A majority of women in both cohorts are ER positive but their percentages are not similar. A marked difference exists in distributions according to anti-estrogen therapy, family history, and type of first BC. For example, both anti-estrogen therapy and family history are “unknown” for a majority of JH women, whereas they are, respectively, “yes” and “no” for a majority of MDA women. Likewise, the MDA data has over 98% “pure invasive” as first BC type, in contrast with about 40% “pure invasive,” 20% “pure DCIS,” and 40% “mixed invasive/DCIS” in JH data. This may be because the associated DCIS information may not have been captured in the MDA data in addition to the primary BC type. Thus, a substantial proportion of pure invasive patients could, in fact, be of mixed type.

Table 1 Case–control distributions in cohorts
Table 2 Case–control distributions according to risk factors

Now we present the validation results. First, we consider the discrimination performance of CBCRisk. Table 3 shows sensitivity and specificity values at the quintiles of risks predicted by CBCRisk. Also presented are the effective cohort sizes (n) that have been used in this calculation after excluding controls with insufficient follow-up periods (as described in Section “Data Sources”). First, we note that, in general, the absolute risks are low. For example, 80% women have less than 2.4 and 1.53% predicted risks of getting CBC in 5 years in the JH and MDA cohorts, respectively. Also, the absolute risks are higher for JH than MDA; we will elaborate on this point in the Discussion section. We can see from the table that thresholds of 1% or less correspond to sensitivity values between 0.8 and 0.9 depending on the cohort and the number of years in the prediction period.

Figures 1 and 2 present the ROC curves for 3- and 5-year predicted risks for the two cohorts, respectively. Except near the left edge in the MDA cohort, the curves in all cases are above the \(45^\circ\) line, indicating that the model is useful for distinguishing between those who get CBC and those who do not. This is also confirmed by the AUC estimates, which for 3-year risks are 0.65 for JH and 0.61 for MDA. These estimates for 5-year risks are 0.62 and 0.61 for JH and MDA, respectively. The two ROC curves in each figure cross, so the model’s performance in one cohort cannot be said to be uniformly better than the other. However, based on the AUC, the model’s overall discrimination ability is better in the JH cohort than the MDA cohort. A likely reason for this is that information on high-risk pre-neoplasia status and breast density—two risk factors associated with higher risk—is available for some women in JH cohort but is totally missing in MDA cohort. Also, in the JH cohort, the AUC is larger for 3-year risks than for 5-year risks whereas the two are the same in the MDA cohort.

Table 3 Sensitivities and specificities of CBCRisk at thresholds given by the quintiles of predicted 3- and 5-year risks (in %)
Fig. 1
figure 1

ROC curve and AUC for 3-year risks predicted by CBCRisk

Fig. 2
figure 2

ROC curve and AUC for 5-year risks predicted by CBCRisk

Next, we consider the calibration performance. Tables 4 and 5 present estimates of E/O and their 95% CI for 3- and 5-year risks. In addition to the overall E/O, the tables show E/O for the five risk quintile groups that are formed by the thresholds listed in Table 3, and for different levels of each risk factor used in CBCRisk. For the JH cohort, the overall E/O for 3-year risk is 2.02, implying that CBCRisk overpredicts the number of CBC cases by about twofold. The overprediction is persistent across the board in all five risk quintile groups and in categories of age at first diagnosis, ER status, first BC type, and high-risk pre-neoplasia. On the other hand, the risk is underpredicted in most of the categories of family history and age at first birth. Note, however, that only 30 cases of CBC (0.6% of the effective cohort) are observed at the end of the 3-year prediction period, and many subgroups have 5 or fewer cases. The E/O in groups that have a very small number of cases may not be stable. The calibration improves for 5-year risk in which case the overall E/O is 1.56 and 53 cases (1.3% of the effective cohort) are observed at the end of the 5-year prediction period. Except the categories with a very small number of cases, the pattern of over- and underprediction is similar to that seen for 3-year risks.

For the MDA cohort, the overall E/O for 3- and 5-year risks are 0.61 and 0.52, respectively. Thus, in contrast to the JH cohort, the model underpredicts the number of CBC cases in this cohort by 40 and 50% for 3- and 5-year risk predictions, respectively. The underprediction occurs in all risk quintile groups except one as well as in all risk factor groups. Moreover, the observed numbers of cases at the end of 3- and 5-year prediction periods are 45 and 61 (1.3 and 2.5% of effective cohorts), respectively.

Table 4 Expected (E) and observed (O) numbers of CBC cases for 3-year predicted risk together with 95% CI for E / O
Table 5 Expected (E) and observed (O) numbers of CBC cases for 5-year predicted risk together with 95% CI for E / O

Discussion

In this article, we evaluated the calibration and discrimination performance of CBCRisk for predicting 3- and 5-year absolute risks starting from age at the first BC diagnosis. From the calibration evaluation, we find that CBCRisk overpredicts the number of CBC cases in the JH cohort by 1.56–2.02-fold, and it underpredicts in the MDA cohort by 40–50%. This difference in performance can be explained to some extent by the difference in the compositions of the two cohorts. The women in the JH cohort tend to be associated with higher risks of CBC compared to those in the MDA cohort, thereby leading to an increased value of E for that cohort. For example, the “pure DCIS” and “mixed invasive/DCIS” categories of first BC type are associated with higher risks of CBC compared to the “pure invasive” category [10]; and the JH cohort is 60% “pure DCIS” or “mixed invasive/DCIS,” whereas the MDA cohort is 98% “pure invasive.” Likewise, unknown family history of BC is associated with a higher risk compared to no family history and the JH cohort has 96% unknown family history, while the MDA cohort has 83% no family history. Moreover, unknown status of anti-estrogen therapy is associated with a higher risk compared to having the therapy; and the JH cohort has 63% with unknown status, whereas in the MDA cohort 70% have anti-estrogen therapy.

Another reason is related to the requirement that the controls be followed up for at least 3 and 5 years for validation of 3- and 5-year risks, respectively. Obviously, the number of controls satisfying this requirement would drop as the risk prediction period becomes longer. However, the drop is much steeper for the MDA cohort than for the JH cohort because the follow-up times for controls are generally shorter for MDA. For example, in case of JH, out of 5918 total controls in the cohort, 4996 (84%) have been followed up for at least 3 years and 4089 (69%) for 5 years. In contrast, for MDA, out of 5074 total controls in the cohort, only 3437 (68%) and 2417 (48%) have been followed up for at least 3 and 5 years, respectively. A smaller number of controls in the MDA cohort leads to a decreased value of E for that cohort. Also, in the MDA cohort, the missing values on high-risk pre-neoplasia and breast density for all women and age at first birth for most women might have contributed to underprediction of number of CBC cases.

Yet another difference between the two cohorts is the prevalence of CBC. It is lower in the JH cohort than that the MDA cohort. The latter is closer to what we see in large population-based databases such as BCSC and SEER. However, the JH cohort spans relatively more recent years (1990–2016) compared to MDA (1983–2014) and BCSC (1995–2009). As the incidence of CBC has declined over the years in general, this might partially explain the lower prevalence, and hence overprediction by CBCRisk, in the JH cohort.

From the discrimination evaluation, we find that the AUC estimates range between 0.61 and 0.65. To put these calibration and discrimination results for CBCRisk into perspective, we may compare them to those reported in the validation studies of the popular BCRAT and related models for predicting a woman’s risk of developing first invasive BC. A review of literature shows that the overall E/O ratios reported therein range between 0.5 and 2.5 with an average of 1.3, and the AUC reported therein range between 0.47 and 0.65 with an average of 0.57 [14, 19,20,21,22,23,24,25]. Thus, the performance of CBCRisk is in line with those of BCRAT and related models. Given that these models are widely used in practice, it may be reasonable to conclude that CBCRisk should also be considered acceptable for clinical use.

Recently, a set of guidelines from Manchester, UK has been proposed for counseling women about CPM [26]. It includes a simple formula for estimating the lifetime risk of CBC. For this, a total risk is first calculated using 0.5% risk per year starting from the age of first diagnosis to age 80 (the formula assumes 80 as average life expectancy) and then the risk is adjusted according to personal factors of anti-endocrine treatment, BRCA mutation carrier status, oophorectomy, and family history. For example, for a woman receiving anti-endocrine treatment, the total risk is reduced by 50% while for positive family history, the risk is doubled. This estimation does not involve formal modeling and does not provide 5- or 10-year risk of CBC, which may be more relevant as most women who get CBC typically get it within 15 years of their first BC diagnosis (in SEER data, we found that over 99% CBC cases occurred within 15 years of first BC diagnosis). Even though CBCRisk can be used to calculate risk up to age 80 or 88 (assuming either one to be lifetime), very few women in the validation cohorts have been followed up for 10 years or more, let alone for their lifetime. Thus, validating a lifetime risk calculation is problematic with these data. If data with long-term follow-ups as well as relevant risk factors become available, a comparison of Manchester formula and CBCRisk will be worth doing.

A limitation of our study is missing data. A substantial number of women in both cohorts do not have information on several risk factors used in CBCRisk, which may adversely affect the model’s ability to predict accurately. Moreover, large proportions of women have not been followed up for more than 5 years. There is, in general, a dearth of prospectively collected data on BC patients that have all relevant risk factors recorded on most women. For example, breast density is typically unavailable in earlier data and age at first birth may not have been collected on women who have been already diagnosed with breast cancer (as opposed to unaffected women for whom age at first birth is used for risk prediction of first BC and hence is collected). Despite the limitations, our study shows that CBCRisk is useful for prediction of risk of CBC for woman diagnosed with unilateral breast cancer. Thus, we believe that CBCRisk can be used confidently in clinical settings to aid physicians in quantifying the risk of CBC, and thereby help BC patients weigh the pros and cons of various treatment and prophylactic options in a more informed manner. An effective counseling aided with a personalized CBC risk estimate may potentially help alleviate the rate of medically unnecessary CPMs.