Introduction

Breast cancer is the most common cancer among women in the United States (U.S.) [1]. It is the second leading cause of cancer death among women overall, after lung cancer, but the leading cause of cancer death among Black and Hispanic women [1]. In 2022, approximately 287,850 new cases of invasive breast cancers and 51,400 cases of ductal carcinoma in situ (DCIS) will be diagnosed among U.S. women and 43,250 women will die from breast cancer [1].

The mortality rate of breast cancer is significantly higher in African American women than White women (28.4 vs. 20.3 per 100,000), although the incidence of breast cancer is minimally lower in African American women (126.7 vs. 130.8 per 100,000) [2,3,4]. There are important age differences in the incidence of breast cancer between races. African American women tend to be diagnosed at a younger age than White women: the median age at diagnosis is 60 years for African American women, compared to 63 years for White [5]. Diagnosis of invasive breast cancers peaked for White women in their mid-60 s, but peaked for minority women in their 40 s (non-Hispanic Black, Asian and Hispanic women). Analysis using recent SEER data showed that compared with non-Hispanic (NH) White women, minority women are 72% more likely to be diagnosed with invasive breast cancer under age 50, and 58% more likely to be diagnosed with advanced stage under age 50 [6].

The age at death is also younger for African American women: median age at breast cancer death is 63 years for African American, compared to 70 years for White women [7]. Minority women are 127% more likely to die from breast cancer under age 50 than NH-White women [6]. This discrepancy can be partly explained by the higher incidence of triple negative breast cancers (estrogen, progesterone, and HER-2 receptor negative) and higher stage at diagnosis in African American women compared to White women. Additionally, African American women were 30–60% more likely to be diagnosed with stages II–IV breast cancer, and at 40–70% higher risk of stage IV breast cancers across all subtypes, compared to White women [8]. By not accounting for these racial and ethnic differences, guidelines that delay screening until age 45 or 50 fail to recognize this elevated risk for minority women and adversely affect them. The effects are most impactful for African American women.

Widespread use of screening mammography and improvement of therapies have significantly reduced breast cancer mortality; unfortunately, this benefit is not equally shared across different racial populations. Breast cancer mortality rates have only decreased by 26% in African American women, in contrast to 40% in White women since 1990 [5]. The poorer prognosis and outcome of breast cancer in African American women is multifactorial and often attributed to lack of access to screening mammography [6] with insufficient large-scale studies comparing the efficacy and performance of screening mammography among different racial populations. The purpose of this study is to compare the screening performance metrics across racial and age groups using more recent data with broader representations of the U.S. population from the National Mammography Database (NMD).

Methods

This retrospective analysis is HIPAA compliant and IRB approved [9]. All American College of Radiology (ACR) registries, including the NMD, have strict procedures for de-identification of patient, facility and interpreting physician information in the research data extract to protect confidentiality. The study dataset was anonymized and de-identified, prior to analysis performed by the NMD staff. The non-NMD investigators did not have access to patient, facility or physician identifying information.

Study population

Established by the ACR in 2008, the NMD is the largest national mammography database, containing results from over 35 million mammograms (35,000,842) from 950 facilities in 46 states of the United States as of 3/24/2023 (personal communication, ACR staff). The NMD was designed as a quality improvement tool to enable mammographic facilities and radiologists to compare their mammography performance with that of their peers nationally, regionally, and locally. The NMD accrues clinical practice data reported voluntarily by facilities, including patient demographics, clinical findings, mammography interpretation and biopsy outcomes [9]. Because of its diverse representation and data from current clinical practice, NMD benchmark data closely reflect what U.S. mammography practices and radiologists receive from their on-site performance audits [10].

This study analyzed data from all NMD sites that contributed screening mammogram information between January 1, 2008, and December 31, 2021, including 746 facilities from 46 states in the United States. The NMD collects self-reported clinical practice data including patient demographics, exam type, indication, screening and diagnostic mammography interpretations and biopsy results [9]. Patients aged 30–100 years with ≥ 1 year follow up were included. Patients were stratified by 10-year age intervals and five racial groups (African American, American Indian, Asian or Pacific Islander, White, multi-race or unknown), which were defined by investigators based on the Classification of Federal Data on Race and Ethnicity. Incidence of patient risk factors (breast density, personal history and family history of breast cancer, age distribution), availability of prior mammogram in the NMD, and time since prior mammograms were compared. Dense breasts are defined by reported mammographic density of heterogeneously dense and extremely dense breasts per the BI-RADS Atlas [11]. Diagnosis of breast cancer includes both ductal carcinoma in situ (DCIS) and invasive breast cancer. Family history is defined as any family member with breast cancer, per NMD data dictionary.

Outcome variables

Five validated screening mammography metrics were calculated for each age and racial group: recall rate (RR); cancer detection rate (CDR); positive predictive value for recalled exams (PPV1), biopsy recommended (PPV2) and biopsy performed (PPV3) [12]. Both DCIS and invasive breast cancers were considered true positives. RR was the percentage of women recalled for additional imaging; CDR was the number of breast cancers diagnosed per 1000 screening exams; PPV1 was the number of breast cancer per number of recalled exams from screening. PPV2 was the number of women diagnosed with breast cancer per number of women recommended for biopsies; PPV3 was the number of women diagnosed with breast cancer per number of women who underwent biopsies. Self-reported facility characteristics including facility category (academic, community, multispecialty clinic, freestanding imaging center, others), facility location (metropolitan or > 100,000 population, suburban, rural or < 50,000 population), facility census division (New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain, Pacific) and facility annual volume were assessed and previously described [9]. We examined the effects of patient age, risk factors, racial groups, as well as facility characteristics on the five screening metrics.

Rate of patient follow up was determined in two ways: the proportion of women who returned for additional imaging within 90 days of an abnormal screening mammography (BI-RADS 0); and the proportion of women who returned for biopsy within 90 days of a diagnostic evaluation recommending a biopsy (BI-RADS 4 or 5), which was recognized as timely follow-up [13, 14]. Rate of follow up was compared across the five racial groups.

Statistical analysis

Analysis was performed using SAS (version 9.4, Cary, NC). Ninety-five percent confidence intervals (95% CI) were calculated for the recall rate, cancer detection rate, PPV1, PPV2 and PPV3, using 1.96 times the standard deviations calculated from Poisson statistics. Test for trend was performed for outcome variables as a function of age grouping. P-value < 0.05 was used as the threshold for significance. Multivariable regression analysis was performed to assess for statistical significance in RR and CDR between racial groups at different facilities.

Results

Study population

29,479,665 screening mammograms performed in 13,181,241 women between January 1, 2008, and December 31, 2021, in the NMD were included and analyzed. Demographics and incidence of breast cancer risk factors of the study population are presented in Table 1. African American women accounted for 4.91% (647,575/13,181,241) of the study population and 4.90% (1,444,148/29,479,665) of all mammograms, compared to 32.80% (4,323,478/13,181,241) and 37.59% (11,082,618/29,479,665) for White women, respectively. Since race is not a mandatory data element in the NMD, additional sensitivity analysis was performed to detect possible bias between women with race information and those without. Importantly, women with unknown race have similar demographic characteristics as the rest of study population. After excluding women with unknown or multi-race, African American accounted for 12.21% (647,575/5,303,679) of women and 10.90% (1,444,148/13,243,941) of mammograms; White accounted for 81.51% (4,323,478/5,303,679) of women and 83.68% (11,082,618/13,243,941) of mammograms; Asian American accounted for 4.88% (258,561/5,303,679) of women and 4.59% (607,416/13,243,941) of mammograms; American Indian accounted for 1.40% (74,065/5,303,679) of women and 0.83% (109,759/13,243,941) of mammograms. Mean patient age, personal history of breast cancer, family history of breast cancer, and time since prior mammogram were similar across all racial groups, including the unknown race group. Asian women have a significantly higher (56.9%, 345,715/607,416) proportion of dense breasts compared to all women (37.0%, 10,916,097/29,479,665; P < 0.001). African American women have the highest rate of prior mammograms available for comparison (50.07%) among all racial groups. With regards to screening interval, most women (56.1%, 16,550,025/29,479,665) had a mammogram within the last 2 years, with annual interval (37.5%, 11,066,969/29,479,665) being the most common across all race groups.

Table 1 Demographics of the study population including BI-RADS breast density, mean age, personal history of breast cancer, family history of breast cancer, availability of prior mammographic comparison and time since prior mammogram

Screening outcome by age and race

Overall, mean RR was 10.00% (95% CI 9.99 to 10.02%), CDR 4.18/1000 (4.16–4.21), PPV1 4.18% (4.16–4.20), PPV2 25.84% (25.72–25.97), PPV3 25.78% (25.66–25.91). Age and race have a significant impact on screening mammography performance metrics, as presented in Tables 2 and 3. With advancing age, recall rate significantly decreases, while CDR, PPV1, PPV2 and PPV3 significantly increase, across all racial groups (Fig. 1A and B). African American women have a significantly higher recall rate (10.95%, 10.89–11.00%), whereas White women have the lowest recall rate (9.61%, 9.60–9.63%), across most age groups. Moreover, African American women also have a significantly lower cancer detection rate (3.91/1000, 3.81–4.02), compared to White women (4.56/1,000, 4.52–4.60) and all women (4.18/1,000, 4.16–4.21). Asian women, with the highest proportion of dense breasts, have recall rates (9.99%, 9.91–10.06) similar to all women (10.00%, 9.99–10.02). In addition, African American women have the lowest PPV1 (3.58%, 3.49–3.67), PPV2 (21.17%, 20.68–21.66), and PPV3 (20.34%, 19.87–20.82) among all race groups, while White women have the highest PPV1 (4.74%, 4.70–4.78) and PPV3 (26.64%, 26.44–26.84) and second highest PPV2 (27.85%, 27.64–28.06). The unknown group have screening metrics ranging between White and African American women.

Table 2 Comparison of recall rate and cancer detection rate across different age and racial groups
Table 3 Comparison of positive predictive values (PPV) across different age and racial groups
Fig. 1
figure 1

A Bar graph of recall rates by age and race. Recall rate declined significantly with increasing age. Recall rate was higher in African American women compare to all other racial groups. X axis represented age in years. Y axis represented recall rate in %. B Bar graph of cancer detection rate by age and race. Cancer detection rate increased significantly with advancing age. African American women had higher cancer detection rate than all and White women at age 30–39 years, consistent with the latest ACR guidelines on screening for higher-than-average risk women [43]. X axis represented age in years. Y axis represented cancer detection rate per 1000 exams

Screening outcome by facility characteristics

There are significant differences in recall rates and cancer detection rates by facility category, location, and census division (Supplemental Tables 1, 2, and 3). No significant variations in screening performance were observed by facility annual mammogram volume (data not shown). Academic facilities and metropolitan facilities have the highest recall rates (academic OR 1.61, 95% CI 1.47–1.65; metropolitan OR 1.24, 1.21–1.27), with the highest cancer detection rates (academic OR 3.13, 2.81–3.47; metropolitan OR 1.29, 1.24–1.35). Facilities in West North Central and Pacific census divisions have the two lowest recall rates (West North Central OR 1; Pacific OR 1.09, 1.05–1.12; P < 0.001) yet the two highest cancer detection rates (OR 2.22, 1.87–2.64; P < 0.001 and OR 2.64, 2.19–3.20; P < 0.001 respectively). These significant performance variations at the facility level are similar across different race groups.

In terms of patient follow up rates, a significantly lower proportion of African American women returned for recall following abnormal screening mammograms (52.0%, 51.77–52.30%) and for the recommended biopsies (65.7%, 64.34–65.91%) within the 90-day window compared to White women (61.2%, 61.08–61.28%; 74.4%, 74.17–74.71%) and all women (61.2%, 61.14–61.26%; 68.9%, 68.74–69.08%) (Table 4). A similar pattern is seen with American Indian women who also have low rates of patient follow up; only 51.8% returned for screening recall (50.88–62.22%) and 60.6% returned for recommended biopsies (56.95–64.10%) at 90 days.

Table 4 Patient-level analysis comparing the rate of follow up for screening mammogram recalls (BI-RADS 0) and biopsy recommendations (BI-RADS 4 or 5) by race

Discussion

This is the largest study to date comparing the performance of screening mammography in U.S. women by racial groups. Age, mammographic breast density, personal history of breast cancer, and family history of breast cancer all are independent risk factors for breast cancer; however, these factors are similar across all race groups in our study population and do not explain the racial disparity seen in screening outcomes. Interestingly, African American women have the highest proportion of prevalent screening mammograms among all racial groups yet have the highest recall rates. Annual screening mammography was the most common interval in our study population, closely followed by biannual screening. This reflects inconsistencies in the current screening guidelines and recommendations from professional societies and governmental bodies.

Our overall screening performance metrics are consistent with prior analysis from the NMD with similar mean recall rates (9; 15). However, the mean cancer detection rate has notably increased from the first NMD publication at 3.43/1000 (3.2–3.7) in 2016 [9] to 4.18/1000 (4.16–4.21) in our study. This trend likely reflects the substantial growth in registry data and subsequent cancer accruals over time, since the NMD became the largest mammography database in the United States in 2018 [15]. This may also reflect the improved sensitivity in cancer detection from better technology and subspecialty training. Our study includes data from 746 contributing NMD facilities, which represented 8.4% of the 8832 Food and Drug Administration-certified mammographic facilities in the United States as of April 1, 2023 [16].

Asian women, with the highest proportion of dense breasts, surprisingly only have a recall rate at the average level. In contrast, African American women have significantly higher RR, but significantly lower CDR, PPV1, PPV2, and PPV3, compared to White and all women. While African American women have the highest false positive rates from mammographic screening, which are typically considered as a harm from screening, they also have the lowest breast cancer detection rates. This finding is consistent with prior observations of racial disparity in breast cancer mortality and mammographic screening performance [1]. Secondary prevention through mammographic screening can prevent death and alongside treatment advances, is attributed with substantial reductions in breast cancer mortality. Progress to reduce breast cancer mortality could be accelerated by mitigating racial disparities through increased access to high-quality screening and treatment via partnerships between community stakeholders, advocacy organizations and health systems.

Primary care providers following professional society or governmental recommendations for screening mammography may be inadvertently putting African American women at a disadvantage. The U.S. Preventive Services Task Force [17], the American Academy of Family Practice [18], and the American College of Physicians [19] all recommend starting mammographic screening at age 50 years, with the option to begin between 40 and 49 years, depending on individual risk factors and personal choice [20]. The American Society of Clinical Oncology and the American Cancer Society now recommend beginning screening mammography at age 45 years, with the option to start at age 40 years [21]. Given that 23% of breast cancer cases in African American women occur under age 50 years (compared to 16% for White), and knowing that these cancers are often of the more aggressive, triple negative molecular subtype, delaying screening to age 50 will likely contribute to the higher breast cancer mortality seen in African American women [1]. Consequently, the latest ACR recommendation advises risk assessment for all women by age 25, with special attention to black and minority women, who are at higher risk of breast cancer at younger ages [4].

As we attempt to explain the racial disparity in screening outcomes, we observed significant screening performance variations by facility category, location, and census division. Academic and metropolitan facilities have the highest recall rates and cancer detection rates, consistent with prior study demonstrating higher cancer detection rates and higher proportion of early-stage cancers in academic compared to community practices [22]. Moreover, breast radiologists with recall rates 12% or higher found significantly more cancers than those operating within the 5–12% range [22]. Although there is no direct linkage to radiologists’ training in this dataset; lack of subspecialty training in breast imaging has been reported in community and freestanding facilities and associated with inconsistent adherence to benchmarks and reporting guidelines. This may account for the higher false positives and lower cancer detection rates observed in our study [23,24,25,26].

Facilities located in the Middle Atlantic, South Atlantic, and New England census divisions have some of the highest recall rates; this may be partly related to geographic distribution of medical malpractice cases. In 10-year analysis of breast cancer malpractice litigation, cases most frequently involved New York (67/253 cases), California (N = 34), Massachusetts (N = 22), Florida (N = 20) and Pennsylvania (N = 19) [27]. Abundant literature suggests that the prevalence of malpractice litigation drives radiologists to take defensive measures that minimize malpractice risk, such as additional screening recalls, even though they may not be in the best interest of the patient [28,29,30]. Despite these significant performance variations between facilities, they are consistent across race groups and do not explain the observed racial disparity in screening outcomes.

Rates of patients returning for recommended imaging and biopsy following abnormal mammograms demonstrate significant variations by race. The first source of patients lost to follow up occurs at the screening recall, where African American (52.0%) and American Indian women (51.8%) have the lowest rates of patient follow up, compared to 61.2% for White women. The second source of patients lost to follow up occurs following biopsy recommendation, where only 65.7% of African American and 60.6% of American Indian women returned for the biopsies, compared to 74.4% of White women. Jones et al. found that over 28% of women fail to return for timely follow-up (within 3 months) following BI-RADS 4, 5, and 0 assessments [31]. African American race, pain during the mammogram, and lack of a usual provider were significant independent predictors of inadequate follow-up [31].

Successful breast cancer screening relies on timely follow-up of abnormal mammograms with potentially clinically significant findings. Delayed or missed follow-up undermines the potential benefits of screening and is associated with poorer patient morbidity and mortality outcomes [32, 33]. Factors influencing follow-up have been well studied, with barriers identified at the health system, primary care physicians and patients’ levels [34]. Furthermore, given the persistent disparities in later stage breast cancer diagnoses and increased mortality in African American and Latina women reported across several studies [35, 36], identifying and addressing barriers to the suboptimal follow-up of abnormal mammograms in these populations is imperative in order to improve breast cancer outcomes.

Evidence suggests improved physician–patient communication may help overcome patient-related barriers to follow-up and in turn improve patient outcomes. In particular, effective primary care physician–patient communication was key to ensuring women understood their abnormal mammogram results and the need for follow-up [37, 38]. Further, African American women with an abnormal mammogram that had open dialogue with their physician and received clear information about recommended follow-up procedures were more likely to have adequate follow-up [39]. Additionally, Battaglia et al. found patient navigation interventions alone improved timely follow-up in low-income and ethnic minority women with an abnormal mammogram in primary care [40]. The randomized controlled trial by Ferrante et al. found that women with an abnormal mammogram receiving navigated care in a non-primary care setting not only had improved follow-up, but also reported less anxiety and greater satisfaction with their follow-up care [41]. Moreover, navigated care was associated with lower breast cancer stage at diagnoses [42]. Overall, addressing factors contributing to inadequate follow-up with targeted interventions, especially in ethnic minority women most at risk could optimize follow-up and improve patient outcomes.

There were several limitations to this study. The retrospective design and availability of data introduce selection bias. Facility and patient characteristics were self-reported and not verified. Ethnicity including Hispanic and non-Hispanic information was not available for this study cohort. Since race was not a mandatory data element in the NMD, sensitivity analysis was performed to detect possible bias between women with race information and those without. Importantly, women with unknown race have similar demographic characteristics as the rest of study population. Although the large size of the unknown race group may introduce selection bias, it does not behave like an outlier on all screening performance metrics. We defined timely follow up as within 90 days following abnormal results; this delay potentially includes areas with limited resources and longer wait times. NMD does not have direct linkage to cancer registry and so breast cancer diagnoses are reported by the facilities. Patients who went to different NMD facility or a non-NMD facility, will be counted as a different patient or not counted, thereby artificially lower the cancer detection rates. Finally, this data extraction did not capture the modality (tomosynthesis vs. 2D) due to changes in coding.