Background

Next to sex and age, the strongest risk factor for breast cancer is family history. Risk conferred by family history generally exceeds that associated with reproductive factors, use of postmenopausal hormone replacement therapy, and obesity, but is highly variable and therefore difficult to quantify for any given woman. Genetic testing has become a routine part of breast cancer risk assessment for females with a family history [1, 2]. However, even testing with multigene panels detects pathogenic variants in only up to a quarter of families with significant history [3]. Current guidelines advocate for earlier screening and inclusion of breast MRI for females with greater than 20% lifetime risk based on family history [4, 5]. However, the extent of family members to include, and the impact of breast cancer diagnoses in distant relatives is unknown. Breast cancer risk assessment models vary considerably in the extent of family history analyzed and the feasibility of collecting and entering data in clinical practice [6]. Goals of more tailored cancer-screening strategies [7] require research on the optimal family history required for clinically meaningful risk assessment.

Estimated risks for breast cancer based on the complete constellation of a woman’s family history for breast cancer as measured in an extensive genealogy linked to cancer data are presented. Some of the risk prediction estimates presented are equivalent to carrying a rare high-risk variant. These risk estimates presented for a large number of possible constellations of affected relatives provide a general view of the important role of both close and distant family history in breast cancer risk prediction, and may contribute to better informed and individually tailored decisions about both screening (including age to initiate screening and additional screening modalities such as MRI) and chemoprevention for breast cancer.

Materials and methods

Utah population database (UPDB) and Utah cancer registry (UCR)

This study utilized a large and comprehensive genealogical and cancer phenotype resource, the UPDB. The UPDB is a unique resource that has been used to understand familial clustering and genetic predisposition to cancer in Utah for over 45 years [8, 9]. Genealogies of original Utah settlers, created from complete genealogy data computerized in the 1970s, and updated since using Utah Vital Statistics data (e.g., mother, father, and child from a birth certificate), have been linked to the Utah cancer registry (UCR), which established statewide required reporting of primary cancers diagnosed or treated in Utah in 1966, and became one of the original NCI Surveillance, Epidemiology, and End Results (SEER) cancer registries in 1973. The UPDB includes data for over 7 million unique individuals, 2.8 million of whom have at least three and up to 16 generations of genealogy. 1.3 million of these individuals have data for at least 12 of their 14 immediate ancestors (both parents, all four grandparents, and at least six of their eight great grandparents), most have much more genealogy; this subset of individuals with deep ancestral genealogical data was analyzed here. Among these 1.3 million individuals, there were 640,366 females, of whom 45,979 had linked cancer records. Breast cancer cases were identified by registration in the Utah SEER Cancer Registry with primary site 500–509, histology 8,000–9,589, and behavior 2–9 according to the International Classification of Diseases for Oncology Revision 3; 15,316 women in the UPDB with at least 12 of 14 immediate ancestors were identified and analyzed here; 1,625 of these breast cancer cases were behavior = 2 (in situ). No individuals were excluded based on breast cancer diagnosis or genetic test results. All analysis was performed with custom software created for use with UPDB rather than with commercial software.

Breast cancer family history constellations

Family history constellation is defined as the complete family history for breast cancer, including first- to third-degree relatives, for both paternal and maternal relatives. The relative risk (RR) for breast cancer for females with various constellations was estimated for the 640,366 females in the UPDB who have deep ancestral genealogy data. To estimate RR for a specific constellation, all females in the UPDB with the specific family history constellation (e.g., at least three first-degree relatives (FDRs) with breast cancer) were identified. These females were termed “probands,” whether or not they had been diagnosed with cancer. A variety of constellations of first-, second-, and third-degree relatives were considered; age at earliest diagnosis was integrated; and both maternal and paternal family history and combinations were considered. Constellations where some relationships were ignored or with a lower bound to the number of affected relatives (e.g., ≥ 3 FDRs) were included to extend the utility of the results to females with less precise family history knowledge.

There are multiple different relationships included in first- to third-degree relationships. Female FDRs include mothers, daughters, and sisters; second-degree relatives (SDR) are the FDRs of FDRs and include half-sisters, grandmothers, granddaughters, aunts, and nieces; third-degree relatives (TDR) are the FDRs of SDRs and include primarily first cousins, but might also include great grandmothers, and great granddaughters. Because our data included cancers diagnosed only from 1966 to 2014, we were more likely to observe affected relatives in the same generation, such as sisters (first degree), half-sisters (second degree), and cousins (third degree). Because of this, TDRs from older generations who may have had cancer prior to 1966 would not be captured in this dataset.

Estimated rates of breast cancer

To estimate the rate of breast cancer in the UPDB population, all females were assigned to cohorts based on 5-year birth year groups and birth state (Utah or other). Cohort-specific rates of breast cancer were calculated from the 640,366 females with deep ancestral data in UPDB. Rates were estimated as the number of breast cancer cases observed in each cohort divided by the total number of females in the cohort.

Individuals whose breast cancer was diagnosed before 1966 or after 2014 are not included; denominators include females in the genealogy who do not live in Utah; and most women born after 1976 are too young to have yet had a breast cancer diagnosis. While breast cancer disease rates for Utah cannot be accurately estimated here, RR estimations are based on the breast cancer rates within the UPDB population and are unbiased.

Estimation of relative risk (RR)

RRs were estimated for multiple different family history constellations of breast cancer. For each constellation pattern, all females with the specific family history constellation (probands) were identified, and the observed number of probands with breast cancer was compared to the expected number. For each constellation, the observed number of probands with breast cancer was counted by cohort. To determine the expected number of cases in the set of probands for a specific constellation, cohort-specific breast cancer rates (as described above) were applied to the set of probands; the cohort-specific breast cancer rates were applied to the number of probands in each cohort, and then summed over all cohorts. The constellation relative risk is calculated as the ratio of the observed to the expected number of probands with breast cancer for the specified constellation pattern. The distribution of the number of observed breast cancer cases is assumed to be Poisson with a mean equal to the number of expected breast cancer cases; 95% confidence intervals for RR are calculated as presented in Agresti [10].

Results

Breast Cancer Family History in the UPDB

A summary of personal and family history of breast cancer for females in the UPDB is presented in Table 1. The table presents females in three groups: all females, females with a family history of breast cancer (FH+), and females without a family history of breast cancer (FH−). Family history of breast cancer was defined as having at least one first-, second-, or third-degree female relative with breast cancer. The number of females with a personal history of breast cancer is shown for each group. These data demonstrate that 59% of females in the Utah population have a positive family history of breast cancer, and that overall, with no consideration of specific family history, a female proband with any family history of breast cancer has more than double the risk of having breast cancer than a proband with no family history of breast cancer (3.1% compared to 1.4%).

Table 1 Characterization of personal history and family history of breast cancer (BC) in UPDB females

Because these summary frequencies include all women of all ages in the UPDB who have ancestral genealogy, they do not represent true rates of breast cancer in Utah, which are considered to be similar to the U.S. national lifetime rate estimated at 12%. Considerations of categorizations based on specific family history constellation allow further discrimination of those at highest risk; these are examined in more detail below.

Estimated RRs based on first-degree family history

Table 2a shows the estimated RRs based on first-degree family history. The estimated RRs are based only on number of affected FDRs, with affected status of SDRs and TDRs ignored. Significantly elevated risk was observed for probands with one affected FDR (RR = 1.61) increasing to RR = 5.00 for probands with at least four affected FDRs. The estimated RR for at least two affected FDRs = 2.60 (2.42, 2.80); data not shown.

Table 2 Estimated RRs for breast cancer based on a proband’s first-, second-, and third-degree family history

Estimated RRs based on second-degree family history

Estimated RRs based on affected SDRs are presented in Table 2b. For this set of constellations, the probands have 0 affected FDRs and TDRs are ignored. Risks for probands with exactly one and up to at least four affected SDRs, with no FDRs affected, are significantly increased over population risk. The RR for two or more affected SDRs with 0 affected FDRs and ignoring TDRs = 1.23 (95% CI 1.15, 1.32); data not shown. The RR for probands with at least four affected SDRs but 0 FDRs affected (RR = 1.71) is similar to the RR for probands with at least one FDR affected (RR = 1.74; 95% CI 1.68, 1.79; data not shown), indicating the importance of consideration of SDR family history even in the absence of an affected FDR.

Estimated RRs based on combined first- and second-degree family history

The contributions to RR based on SDR family history in the context of exactly one affected FDR are summarized in Table 2c. All estimated RRs were significantly elevated. These results show that SDR family history significantly affects risk, even in the presence of FDR family history. The RR for exactly one FDR and at least two SDRs (RR = 2.39; CI 2.19, 2.60; data not shown) is equivalent to the RR for exactly two FDRs when other relationships are ignored (RR = 2.42; CI 2.22, 2.62; Table 2a).

Estimated RRs based on third-degree family history

Table 2d presents RR estimates based on TDR family history with no affected FDRs or SDRs. The overall RR for probands with no family history of breast cancer (FDR = 0; SDR = 0; TDR = 0) is significantly less than 1.0 (RR = 0.74; CI 0.72, 0.77), as expected. Even in the absence of affected FDRs and SDRs, probands with at least five affected TDRs (cousins) are at significantly increased risk compared to population rates (RR = 1.32; CI 1.11, 1.57), similar to the RR for probands with at least two affected SDRs with FDR = 0 and ignoring TDRs (RR = 1.23; CI 1.15, 1.32; data not shown). Any number of affected TDRs in the presence of one FDR and SDRs ignored was also associated with significantly elevated risk (e.g., FDR = 1, ≥ TDR RR = 1.91; CI 1.56, 2.31; data not shown), compared with FDR = 1 with SDRs and TDRs ignored where RR = 1.61; CI 1.56, 1.67.

Estimated RRs considering earliest age at diagnosis of affected relative

Table 3 summarizes the RR estimates for constellations that consider the earliest age at diagnosis of breast cancer in a FDR in the presence of at least one affected FDR and ignoring SDRs and TDRs. The estimated RR for at least one affected FDR diagnosed at any age is 1.74 (1.68, 1.79) (data not shown). In Table 3, the RRs range from 1.42 for those whose earliest affected FDR was after age 80 years, to 2.32 for probands whose earliest affected FDR was before age 50 years. A proband with a family history of even one FDR with breast cancer, even when diagnosed at a late age, is still at significantly increased risk for breast cancer compared to population risk.

Table 3 Estimated RRs based on at least one FDR, ignoring SDRs and TDRs, considering the earliest age at diagnosis for breast cancer in an FDR

Estimated RRs for other family history constellations

Table 4a presents the estimated RRs for a variety of specific FDR relationships and combinations of specific FDRs and SDRs. The estimated RR for at least one affected daughter (RR = 2.37) is significantly higher than the RR for a proband with either an affected mother (RR = 1.78) or an affected sister (RR = 1.68). This may be related to censorship of diagnoses before 1966; because risk was estimated for probands of all ages for each constellation, the average proband age is likely higher for constellations including affected descendants than for constellations including affected ancestors.

Table 4 Estimated RRs for specific relationships including FDRs, combined maternal and paternal relationships, and paternal compared to maternal relationships

Table 4b shows the estimated RRs for combined maternal and paternal family history. This scenario is frequently encountered clinically, but current guidelines for family history criteria do not address the impact of cancer history in both parental lineages. Risks for each side of the family are typically considered separately, and a single maternal and paternal relative would not individually have been considered a significant risk factor (RR for ≥ 1 SDR = 1.08, 95% CI 1.04, 1.12). The combined maternal and paternal examples shown in Table 4b are all the equivalent of at least two affected SDRs, ignoring FDRs and TDRs (data not shown; n = 30,674 probands, RR = 1.53; 95% CI 1.45, 1.61). All of the confidence intervals for the four different constellations considered include 1.53, suggesting there is no synergistic effect for combined paternal and maternal contribution to risk in the examples considered. Nevertheless, the risks could be additive, as some combinations predict RR > 2.0.

Table 4c shows the estimated RRs for equivalent paternal and maternal constellations. The three pairs of equivalent constellations considered all show overlapping CIs for RRs for maternal compared to paternal family history, suggesting that a paternal family history is equivalent to the same family history observed in the maternal line, supporting the importance of considering family history in both lineages of equal significance when estimating risk for an individual.

In order to provide clinicians and patients with a quick guide to identify females at highest risk for breast cancer, Table 5 summarizes family history constellations with risks > 2.0, > 3.0, and > 4.0, which were observed for 4.5%, 0.4%, and 0.04% of females in this population, respectively.

Table 5 Minimal family history constellations associated with RR > 2.0, > 3.0, and > 4.0 for breast cancer, TDRs ignored

Figure 1 compares some commonly used risk assessment models with the risk predictions presented here, taking into account only family history. In the case of no affected FDRs or SDRs, a lower than average risk was predicted by family history constellation (RR = 0.81), while the four models (Gail [11], BRCAPro [12], Tyrer-Cuzick [13]), BOADICEA [14] roughly predicted the recognized population risk of 12%. In the case of both grandmothers affected, for which the family history constellation RR = 2.27, the Gail (only considers FDRs) and BRCAPro (family history evaluated based on Mendelian patterns of inheritance) models again estimated close to population risk (12%), while the Tyrer-Cuzick model, which integrates family history, also predicted increased risk (21% 10 year risk), and the BOADICEA risk estimate was in-between at 16%.

Fig. 1
figure 1

Example of breast cancer risk estimates impacted by distant family relationship

Discussion

Much of the recent research in familial breast cancer has focused on high- and moderate-risk genes. Mutations in these genes confer a high RR but are infrequent or rare in the population. There is also intense interest in both the scientific and lay communities in modifiable risk factors such as obesity, postmenopausal hormone replacement therapy, breastfeeding, and alcohol ingestion. Although important on a population level, these factors (along with ages at menarche, menopause, and childbearing) generally play a modest role on an individual level, with RRs estimated in the range of 1.2 to 1.5.

The findings reported here suggest that a three-generation family history is helpful for optimal breast cancer risk assessment. The American Society of Clinical Oncology guidelines for cancer assessment family history collection only advise assessment of FDRs and SDRs [15] for the purpose of identifying candidates for genetic testing. There are a number of commonly used breast cancer risk assessment tools [16]. All incorporate some element of family history, but many are restricted to FDRs.

Our analysis benefited from accurate cancer family history from a cancer registry. Use of the risk predictions presented here will depend on patient-reported family history, which may be less reliable. Studies based on histories obtained from cancer patients have found a high level of accuracy for breast cancer reports (up to 95% for reports in FDRs), but lower accuracy has been reported from population-based assessments of individuals’ knowledge of family history and for reports of diagnoses in more distant relatives [17, 18].

Better population-based strategies for collecting and evaluating family history are needed. Common barriers to ascertaining family history during a clinic visit include lack of provider time, patient’s inability to recall family member’s diagnosis, and privacy measures which prevent the flow of medical documentation between family members. Tools and processes that promote collection of family history outside of clinic visits may allow patients the time and opportunity to work with family members to document accurate family history and for clinicians to focus visit time on incorporating family history into assessment and management planning. Future research should also address ways of communicating risk and associated recommendations to large populations of women outside of specialized genetics settings. Clinical decision support tools that can generate individualized recommendations about screening and chemoprevention and are tailored to a patient’s specific level of risk may assist in various clinical settings [19].

Even with increasing use of genomic technologies, family history information is important to informing individual cancer risk [20]. Genome-wide association studies have identified hundreds of SNPs associated with breast cancer, and personal genome testing for known breast cancer mutations is widely available. It remains unclear how risk estimates generated from personal genome testing differ from measures of risk from family history. Aiyar [21] analyzed concordance of risk estimates from family history with those from personal genome testing (PGT) for 757 individuals who purchased Navigenics PGT. Breast cancer risks showed only slight agreement (kappa = 0.154; 95% CI 0.02–0.29); of the 49 women with a family history of breast cancer, 29% were categorized as high risk by PGT. Of females with family histories suggestive of hereditary breast and ovarian cancer (lifetime risk for breast cancer likely to be higher than general population), the majority (10/14) were categorized as general population or lower risk of developing breast cancer by PGT. The study concluded that the lack of concordance suggested that family history and PGT provide different and independent information on risk and could be used in a complementary manner.

In a similar comparison of family medical history with personal genome screening for risk assessment, Heald [22] also found little concordance in family history-based risk versus personal genome screening for breast cancer. They conclude that the two methods may be complementary tools for risk assessment, but that family history remains the standard for evaluation of an individual’s cancer risk. Bloss [23] estimated the association of direct to consumer genome-wide disease risk estimates and self-reported family history, and concluded that genomic testing added little value beyond the use of traditional risk factors, but suggested that testing may be useful when family history is not available. All of these investigations stress that family history will remain the standard in current clinical care unless personal genomic testing risk assessment is improved.

Brewer [24] presented risks for breast cancer based on family history, taking account of the expected number of cases in a family; they noted that this enhanced family history score based on both expected, and observed, cases in a family could give greater risk discrimination than conventional risk tools, and concluded that a sufficiently large family history dataset (as, for example, presented here for the UPDB) might provide the best predictor of risk. There have been few other analyses of complete constellations of family history for breast cancer and there are few available databases that would allow these analyses.

The RRs estimated for the Utah population are in good general agreement with the comparable RRs reported by others. The Collaborative Group on Hormonal Factors in Breast Cancer published a survey of over 58,000 breast cancer cases in 52 studies and characterized risk by particular familial patterns [25]. Although limited to FDRs, some comparisons to the Utah constellation RRs presented here are possible. The Collaborative Group study reported RRs of 1.80, 2.93, and 3.90, respectively, for FDR = 1, FDR = 2, and FDR ≥ 3; these RR estimates from the Utah study (when SDR and TDR family history was ignored, see Table 2a) were 1.61 (CI 1.56, 1.67), 2.42 (CI 2.22, 2.62), and 3.84 (CI 3.22, 4.55; data not shown), respectively. The Collaborative Group Study’s comparison group was women with FDR = 0, while the Utah base rate was estimated from the entire population of females with ancestral genealogy in UPDB. Similar to the Collaborative Group results, the Utah analysis showed that the RR for an affected mother (RR = 1.78) was similar to the RR for at least one affected sister (RR = 1.68). Both studies observed a similar, moderate effect of age at youngest FDR diagnosis. Hemminiki and Vaittinen used the family-cancer database from Sweden to estimate familial RRs defined through the mother or daughter, as well as modification of risk by age, and estimated RR = 1.90 for breast cancer in the daughter of an affected mother, and RR = 1.85–1.97 for the mother of one, or two affected daughter(s), respectively [26]. In what might be a novel report, this study provides strong evidence for statistically significantly (albeit still modest) elevated risk for breast cancer even if the only cases of breast cancer are in cousins (TDRs) (RR = 1.32 for ≥ 5 affected cousins with FDR = 0 and SDR = 0; Table 2d); elevated risks for affected TDRs in the presence of a single FDR were also observed. Additionally, some significantly elevated risks were noted for specific combinations of maternal and paternal family history, a family history category for which clinical guidelines are lacking.

Because this study was based on data from a homogeneous population representing a single geographic region, it is important to consider how generalizable the findings are. The Utah population has been shown to be genetically identical to other populations of Northern European descent, but does differ from the US population in some ways [27]. First, breast cancer incidence and mortality are lower in Utah than nationally [28]. Factors contributing to the lower incidence may be younger age at first childbirth, higher average number of pregnancies, lower alcohol ingestion, and lower rates of postmenopausal obesity. The results are likely applicable to populations of females similar to the Utah population, that is, largely from Northern European populations, but should not be extrapolated to other populations without validation.

The constellation RR approach has limitations. Some data in the UPDB are censored: genealogy data may be missing or incomplete; some individual or cancer data may not have correctly linked to genealogy data; non-biological familial relationships may be included; and cancers diagnosed before 1966 or outside the state of Utah are not included. Decades of studies estimating RRs for cancer using the UPDB have confirmed that Utah risk estimates based on family history are similar to estimates reported for other populations. Some family history constellation RR estimates may have been affected by small sample sizes, and this is observed in wider confidence intervals. Finally, these RRs were based only on family history; many factors were not included in risk estimation, including proband’s age and other known risk factors which are recognized to play a role in risk and could have affected the risk estimates presented. The genetic architecture of breast cancer is likely a continuum of common low risk to rare high-risk variants acting together to define an individual’s risk. Evidence suggests that familial risk is modified by genetic risk [29]. To reduce uncertainty and increase precision, it is likely that integration of familial risk and polygenic risk is needed. Until such risk prediction models are created, it is clear that family history of breast cancer is a useful and powerful predictor of risk, and that large datasets like the UPDB can add to our knowledge of the risk associated with specific family history constellations.

To our knowledge, this is the largest population-based dataset to be analyzed for breast cancer RRs based on family history, and comparisons to other similar resources show equivalent results for constellations considered. These results greatly expand published risk predictions for family history. Because of the extent of genealogy data available through the UPDB, rates for breast cancer were estimated in over 640,000 women; thousands of female probands were considered for most of the family history constellations analyzed. This study contributes to the growing field of risk prediction and individualized risk management for cancer. Constellation risks based on the UPDB and using the methods presented here have also been presented for colorectal cancer, prostate cancer, and lethal prostate cancer [30,31,32]. Future extensions to this simple consideration of various family history constellations are underway and will include additional risk factors for breast cancer in the proband and her relatives, even more precise description of family history constellation, and genotypes for markers recognized to be associated with increased risk for breast cancer.

Conclusions

In this population-based survey representing over 600,000 females, 59% of females had a family history of breast cancer (at least one affected FDR, SDR, or TDR). Even a very limited breast cancer family history was shown to significantly affect risk at a level equivalent to hormonal and reproductive factors, for example, RR = 1.23 (CI 1.15, 1.32; data not shown) for 0 FDRs and at least two SDRs. Four and a half percent of the studied female population of Utah was estimated to have a RR > 2.0 for breast cancer based only on their family history. Many of these women would be candidates for enhanced screening and/or chemoprevention based on current recommendations. Individualized risk prediction from specific family history, as presented, allows identification of women at highest risk for breast cancer.