Introduction

Eligibility for breast cancer screening is often determined by sex and age, but breast cancer risk varies greatly across the population of women eligible for screening. Yet, as is evident by the various risk models in use [1,2,3,4,5,6,7], many other risk factors including family history, breast health history, reproductive-related factors (e.g., parity, age at menarche), body mass index (BMI), exogenous hormone exposure (e.g., hormone replacement therapy, HRT), and breast density may be used to determine risk (see Table 1). The widely used Gail [1] and Breast Cancer Surveillance Consortium (BCSC) models [2] include a moderate number of risk factors, but do not leverage the image-level data available from digital mammography. The most recent Tyrer-Cuzick model includes numerous risk factors, including mammographic density and a single-nucleotide polymorphism score [6, 7], but this restricts the utility of the model at mammographic screening.

Table 1 Risk factors included in common breast cancer risk models

Personalized screening based on a woman’s risk has the potential to decrease breast cancer morbidity and mortality while minimizing the harms of screening and decreasing resource utilization [8,9,10,11,12]. Ideally, risk-stratified screening protocols would be based on accurate risk estimates from data easily obtained in the clinical setting [13]. The results of a simulation study in the USA indicated that screening frequency can be tailored using information about breast density, age, family history, and biopsy history [9]. A Spanish study demonstrated that, for every 2000 women screened using an optimal risk-based strategy, there were 1.5 fewer overdiagnosed cases, 97 fewer false positives, 6 fewer biopsies, and €250,000 saved, at the cost of one additional false negative [8].

The utility of a breast cancer risk model depends on the quality of available risk factor information. Many risk factors are self-reported and poor recall may negatively impact the risk estimates obtained [14]. In addition, women tend to overestimate height and underestimate weight, thereby underestimating BMI, a problem compounded by the fact that the amount of bias may change over time [15, 16]. Self-reports of extended family history are not practical to implement because the health status of more distant relatives may be unknown or of poor quality [17]. Other risk factors may not be available or cost-effective for a screened population, such as genetic risk factors, which are more useful for predicting risk in younger women [18,19,20]. An ideal risk model would discriminate risk while maximizing the use of readily available data and minimizing the use of extensive self-reported risk factors. In addition to age and clinical history, readily available high-quality data include risk factors derived from screening mammograms such as breast density, which can be reliably measured using commercially available software.

The purpose of this study was to develop a breast cancer risk model for use at time of screen in a general screening population.

Materials and methods

This retrospective case-control study was nested within a cohort of women aged 40–75 years screened in a population-based breast screening program with full-field digital mammography between January 1, 2009, and December 31, 2015 (n = 8015). Due to missing risk factor data, 245 women were excluded. The final sample included 7770 women: 1882 with histologically confirmed unilateral or bilateral screen-detected or interval breast cancer (cases), and 5888 without screen-detected breast cancer (controls). Breast cancer was defined as ductal carcinoma in situ or invasive breast cancer. Women were considered to have screen-detected breast cancer if they were diagnosed as a result of an abnormal screening mammogram. Women were considered to have interval breast cancer if they received a non-cancer result after a screening mammogram but were diagnosed as a result of symptoms in the 24 months prior to the next regularly scheduled screen. Eligible controls were defined as women whose screening mammogram did not result in a diagnosis of breast cancer (incident sampling approach). Controls were randomly sampled and frequency-matched with cases on a 3:1 basis, matching on both age and year of screen.

All screening mammograms were performed following the practice guidelines for breast cancer screening and technical standards for breast imaging by the Mammography Accreditation Program of the Canadian Association of Radiologists.

This study was approved by the institutional research ethics board, which waived the requirement for informed consent (NSHA-RS/2015-340).

Image-derived risk factors

All mammograms were obtained using a Siemens MAMMOMAT Inspiration or MAMMOMAT Novation DR imaging system. Age was obtained from the image metadata (Digital Imaging and Communications in Medicine (DICOM), header).

A fully automated mammographic density measurement algorithm (densitas® densityai™, Densitas® Inc.) was used to obtain percent mammographic density (PMD; 0–100%), total breast area (cm2), and breast thickness (cm) using both mediolateral-oblique and cranial-caudal views for all studies. The algorithm processes “for-presentation” images used by radiologists in clinic to generate an area-based PMD measure that ranges from 0 to 100% and has demonstrated excellent agreement with radiologists’ area-based visual assessments of density (intraclass correlation coefficient = 0.91, 95% CI 0.89–0.92) [21]. Breast volume (cm3) was used as a surrogate for BMI and calculated by multiplying total breast area by breast thickness [22].

For women with screen-detected cancer, image-derived risk factors were obtained from the screening mammogram that led to the breast cancer diagnosis. For women with interval cancer, these risk factors were obtained from the screening mammogram immediately prior to the interval cancer diagnosis.

Clinical risk factors

Clinical risk factors for breast cancer at screen were extracted from the Breast Screening Program information system, which contains information for all breast imaging procedures in the province. Information was obtained for clinically reported risk factors such as core biopsy history (yes/no, hereafter referred to as biopsy history) and self-reported risk factors such as first-degree family history of breast cancer (yes/no), number of births, menopausal status at time of screen (pre/post), and HRT use at time of screen (yes/no).

Statistical analyses

Descriptive statistics were computed for the image-derived and clinical risk factors using means and standard deviations or counts and percentages. An analysis of variance (ANOVA) was used to evaluate the difference between women with and without breast cancer for continuous risk factors. The chi-square test was used to evaluate the association between breast cancer status and categorical risk factors.

Multivariable logistic regression was used to model the risk of breast cancer (outcome) as a function of image-derived risk factors alone or in combination with clinical risk factors (covariates). All regression models, hereafter referred to as “risk models,” included age as a covariate, plus one additional risk factor. Additional risk models included age as a covariate plus image-derived risk factors, both alone and in combination with clinically reported risk factors (e.g., biopsy history). A final set of risk models included the aforementioned covariates and the self-reported clinical risk factors.

Boxplots were used to show the distribution of predicted probabilities from the regression models; the range width, defined as the maximum minus the minimum, was reported for these predicted probabilities. The range width of the predicted probabilities generated by a risk model relates to the model’s ability to discriminate between cancers and non-cancers. Models with wider range widths of predicted probabilities will be able to better discriminate between women with and without breast cancer. For the primary outcome of the study, the area under the receiver operating characteristic curve (AUC) was used to evaluate how well different models discriminated between women with and without breast cancer. DeLong’s test was used to evaluate the difference in AUC between models.

The predicted probabilities from the regression models were used to assign women to low- (below 10th decile) and high-risk (10th decile) groups, and a quadratically weighted kappa statistic was used to evaluate the agreement between the models for assigning women to these groups. Kappa values of 0.00–0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80, and 0.81–1.00 were interpreted as slight, fair, moderate, substantial, and almost perfect agreement, respectively [23].

All women were divided into deciles (reference deciles) separately for age (1 = youngest to 10 = oldest), PMD (1 = lowest density to 10 = highest density), and the predicted probabilities from the breast cancer risk model (1 = lowest risk to 10 = highest risk). Women were then grouped into sets based on their cancer status (no cancer or any cancer) and mode of detection (screen-detected or interval cancer). For each set, the percentage of women in each decile (1–10) was calculated as the proportion of women in that decile relative to all women within the set.

We calculated that a sample of 1437 cases and 4311 non-cancer controls would achieve 80% power to detect a minimal clinically significant difference of 0.02 between an AUC curve of a risk model with an AUC of 0.58 and another risk model with an AUC of 0.60 using a two-sided z-test at a significance level of 0.05 [24, 25].

Statistical analyses were performed using R (version 3.2.2 for Mac OS). p values < 0.05 were considered statistically significant.

Results

Study sample

The final dataset included images and corresponding risk factor data for 7770 women: 1595 with screen-detected breast cancer, 287 with interval breast cancer, and 5888 without breast cancer. Women with breast cancer were generally similar to those without, but had slightly higher mean PMD and breast volume and were more likely to have a family history of breast cancer and/or biopsy history (Table 2).

Table 2 Descriptive characteristics of women with and without breast cancer

The range width of the predicted probabilities was 0.35 (min = 0.19, max = 0.54) for the model with age and biopsy history, and it was 0.32 (min = 0.15, max = 0.47) for the model with age and PMD (Fig. 1a). The range width of the predicted probabilities increased to 0.44 (min = 0.12, max = 0.56) when the three image-derived risk factors were combined in a single model, and increased to 0.69 (min = 0.10, max = 0.79) when biopsy and family history were added to the model. The range width was 0.71 (min = 0.09, max = 0.80) after the addition of the three remaining self-reported risk factors (number of births, menopausal status, and HRT use; Fig. 1b).

Fig. 1
figure 1

Boxplot of risk estimates for models including age plus additional risk factors

Among the risk models that included age and one other risk factor, the model that included biopsy history best predicted breast cancer risk (Table 3; Fig. 2). The risk model that included the image-derived risk factors had a predictive performance of AUC = 0.597 and the addition of biopsy history increased the model’s predictive performance to AUC = 0.660 (AUC = 0.06; p < 0.001). The inclusion of first-degree family history to the model with image-derived risk factors and biopsy history increased the predictive performance to AUC = 0.664 (AUC = 0.004; p = 0.04). However, the inclusion of number of births, menopausal status, and HRT use to the model with image-derived risk factors, biopsy history, and family history did not significantly improve model performance (AUC = 0.665, AUC = 0.001; p = 0.45).

Table 3 Area under the receiver operating characteristic curve and 95% confidence intervals (CI) for breast cancer risk models based on age and other risk factors
Fig. 2
figure 2

Area under the receiver operating characteristic curve plots for models including age plus additional risk factors

83.6% of women remained in the same low- or high-risk group whether the classifier was based on the risk model using image-derived risk factors or from a model using image-derived risk factors, biopsy history, and family history (Table 4a). However, agreement between the classifiers was poor (kappa = 0.09, 95% CI 0.06, 0.11). There was near-perfect concordance (99.5%) and near-perfect agreement (kappa = 0.97, 95% CI 0.97, 0.98) for the classifier based on the risk model using image-derived risk factors, biopsy history, and family history and that from a model using all available risk factors (Table 4b).

Table 4 Assignment to low- (< 10th decile) and high-risk (10th decile) group based on classifications derived from breast cancer risk models

For the risk model using the three image-derived risk factors, biopsy history, and family history, 22.6% of women with breast cancer were in the top reference risk decile and 4.9% were in the bottom reference risk decile (Fig. 3). 36.7% of women with breast cancer and 47.7% of those with interval cancer were in the top two reference risk deciles compared with 14.7% of those without breast cancer. For reference PMD deciles established using breast density, 15.2% of women with breast cancer were in the top decile and 5.7% were in the bottom decile. 26.0% of women with breast cancer and 42.5% of those with interval cancer were in the top two deciles compared with 19.1% of those without breast cancer. For deciles established using age at screening, 10.9% of women with breast cancer were in the top reference age decile and 9.1% were in the bottom decile (versus 11.0% and 8.5%, respectively, for those without breast cancer).

Fig. 3
figure 3

Percentage of women with screen-detected/interval/both cancers or no cancer by risk/PMD/age deciles

Fifty-five percent (426/777) of the women in the top reference risk decile had breast cancer and 34% (264/777) of the women in the ninth decile had breast cancer.

Discussion

This study explored the potential for a breast cancer risk model to identify women at higher risk at mammographic screening within a general screening population. A risk model consisting of image-derived factors alone predicted breast cancer risk with an AUC = 0.597. The predictive power of this model was significantly improved with the addition of biopsy and family history information (AUC = 0.664), but additional risk factors did not further improve model performance.

The performance of the risk model based on image-derived factors alone was comparable to published results from the Breast Cancer Risk Assessment Tool (Gail model), which has yielded AUCs of 0.58 for estimating the 5-year risk of invasive breast cancer [26] and 0.55 for 10-year risk [27]. Similar to other studies, we observed that a hybrid model including image-derived and clinical risk factors (biopsy and family history) outperformed the models based on image-derived or clinical risk factors alone [28]. Although the hybrid model performed best, biopsy history contributed significantly to the predictive performance of the models considered in this study. False positive screening results may be a marker for an underlying biological susceptibility for breast cancer and may explain this finding [29].

The simplicity and accuracy of the model make it more practical for use in a clinical setting than breast cancer risk models that rely on extensive self-reported family history and genetic information. A breast cancer risk model for use at mammographic screening in population-based screening must be limited to a restricted number of easily acquired, high-quality risk factors to be practical. Although one study found that PMD assessments using a visual analog scale were more strongly associated with breast cancer risk than four automated methods, the same study noted that visual assessments were not practical for population-level screening [30]. In this study, five risk factors were used to derive the best-performing risk model. Age, PMD, and breast volume were obtained directly from the screening mammogram, biopsy history was from the patient record, and first-degree family history was self-reported. Both biopsy history and first-degree family history are likely to be readily recalled by women and robust to recall and reporting bias. Because these data are often collected at time of screening exam, they present little burden to women and the health care system in terms of time or cost.

Women in the top two risk deciles accounted for one in three screen-detected cancers, and nearly one in two interval cancers. In fact, the top risk-based deciles contained a substantially higher proportion of women with breast cancer than the top age- or PMD-based deciles. This finding suggests that risk estimates generated by a simple five-factor breast cancer risk model may be more informative for stratified screening than proxy estimates of risk based on either age or mammographic density. Further studies are needed to investigate whether stratified screening based on this five-factor model could improve performance measures such as the positive predictive value of screening and cancer detection rates, and lead to improved prognosis and reduced treatment costs.

Strengths and limitations

Our study was based on all women with screen-detected and interval breast cancer within the population, eliminating the potential for sampling bias. Moreover, the use of automated software ensured that image-derived factors were assessed in a standardized and reproducible manner from “for-presentation” full-field digital mammograms. Results from the literature show that the direction and magnitude of associations between reproductive factors and PMD from digital mammography are similar to those established from screen-film mammography [31]. However, the use of automated software limits the applicability to environments where such software is available.

All clinical risk factors were extracted from a synoptic reporting information system that supports the population-based breast screening program. These data were collected in a standardized manner during screening and reflect information that is readily available in a clinical setting (e.g., menopausal status but not age at menopause, and current HRT use but not duration of use). Information on BMI was not available, and we used breast volume as a proxy to overcome this limitation [22, 32]. We estimated breast volume by multiplying breast area and compression thickness from mammographic images. This method is strongly correlated with post-mastectomy breast volume (r = 0.997) [33]. The measured breast thickness may differ slightly depending on whether a rigid (Siemens Novation) or tilting (Siemens Inspiration) compression paddle was used, but the impact on the final estimate of breast volume is likely minimal. Overall, the risk factors used in the proposed risk model include those identified as critical for personalized risk stratification [8, 9]. The lack of availability of age at diagnosis of first-degree relatives for inclusion in the proposed risk model is a potential limitation of this study.

A potential limitation of this study is that the risk model was developed to provide tailored estimates of current breast cancer risk, which differs from models that predict 5-year, 10-year, or lifetime risk. However, it has been shown that current and future risk estimates based on a combination of image-derived and clinical risk factors can be similar [28].

An incident sampling approach was used for this study. As a result, it is possible that the controls could have developed interval cancer before their next planned screening mammogram. Based on an interval cancer rate of 1 per 1000 screens [34], roughly 6 of the 5888 women without screen-detected breast cancer might have later been diagnosed with interval cancer. This small number of expected interval cancers is unlikely to have impacted the overall results.

Our study cohort was selected from a fairly homogeneous Caucasian population, and our results may have limited generalizability to a more racially diverse population. However, our model consists of a hybrid of image-derived factors plus clinical risk factors similar to a study by others in which a hybrid risk model consisting of image-derived factors plus clinical risk factors has demonstrated comparable performance for both white and African-American women [28].

The proposed breast cancer risk model would benefit from further validation in other populations.

Summary

Women can be risk-stratified at mammographic screening using a simple model based on a small set of image-derived and clinical history factors, including age at screen, PMD, breast volume, and biopsy and family history.