Introduction

Breast cancer is the most common cancer affecting women in the United States (US) with an estimated 276,480 new cases of invasive breast cancer diagnosed in 2020 and an estimated 42,170 deaths attributed to breast cancer [1]. Despite being pervasive, it has become clear over the past several decades that breast cancer outcomes differ not only according to factors associated with biology, but also those associated with demographics and socioeconomic status, especially race. In a series of reports, DeSantis, et al., have documented the differences over time in both incidence and mortality according to race, with breast cancer historically being more common among White women but a higher mortality for Black women diagnosed with breast cancer [2,3,4]. Examining the data more closely, including by US geographical region, highlighted that these trends were not universal. In fact, the authors reported in 2017 that there was no difference in mortality by race in seven states [4].

Geography on a macro- and micro-scale has long been postulated to play a role in outcomes, and in a recent analysis of the National Cancer Database (NCDB), rural dwellers were found to have a higher stage of breast cancer at diagnosis whereas women who live in metro areas had a reduced overall mortality [5]. Boukovalis and Sariego compared breast cancer diagnosis and mortality in rural and urban areas in Pennsylvania in 1999 and 2009 [6]. They ultimately demonstrated a higher population growth in urban counties, a higher proportion of the state’s breast cancer diagnoses in rural counties, and a decrease state-wide in breast cancer mortality over the decade. This mortality decrease, however, favored urban counties, suggesting differences in detection and treatment. Additionally, Thompson, et al., demonstrated a nationwide trend toward worse breast cancer outcomes for patients in underserved communities, including rural white and urban Black and Latina women [7].

Regional differences in breast cancer incidence and mortality have been assessed using several different methods in order to identify factors other than geographical location, specifically socioeconomic and racial differences, that have an impact on these breast cancer metrics. Moore, et al., mapped hot spots for breast cancer mortality in the entire United States and found that, regardless of race, counties with lower education, lower income and lack of insurance were associated with higher breast cancer mortality [8]. Another report examining Surveillance, Epidemiology and End Results (SEER) data in Detroit found no difference by race in odds for unfavorable breast cancer outcomes, including mortality, when adjusted for socioeconomic factors [9]. Specific regional studies have demonstrated similar findings related to socioeconomic factors in Chicago [10], San Francisco [11], North Carolina [12], South Carolina [13], and California [14] but also highlighted clear disparities in breast cancer treatment and outcomes for Black women. Additionally, differences in stage at presentation and breast cancer subtype, both of which impact outcomes, have been linked not only to race but to neighborhood socioeconomic factors [15, 16]. The more advanced stage at diagnosis and any treatment differences that exist do not fully explain the ultimate racial survival disparities that exist [17, 18], underscoring the importance of socioeconomic factors.

According to the United States Census, Pennsylvania is the fifth largest state by population and Philadelphia, its largest city, is the sixth largest in the nation. Pennsylvania is ninth in the nation by population density, but approximately one-in-four Pennsylvanians live in a rural county. Additionally, Pennsylvania has the 24th highest per capita income in the nation, and its population is made of 10.6% Black individuals, compared with a national average of 13.6%. This socioeconomic diversity, which mirrors that of the nation, is composed of pockets of wealth and poverty that encompasses people of all races. As such, it is an appealing geographical location for epidemiologic study. Therefore, we developed this study to test the hypothesis that breast cancer diagnosis and mortality are similar among races in Pennsylvania when stratified by mean income.

Methods

Data acquisition

The Pennsylvania Department of Health has been collecting data from statewide mandatory reports of cancer diagnoses in Pennsylvania since 1985. They have collated these data with the rates of cancer death for each municipality in the state into a database known as the Pennsylvania Cancer Registry (PCR). We obtained municipality-level data for years 2011–2015 from the PCR Enterprise Data Dissemination Informatics Exchange (EDDIE), which included the number of breast cancer diagnoses stratified by race (White; Black; Other/unknown) and age group at diagnosis with groups representing 5-year intervals (eg. 30–34 years). Institutions completely review each patient’s medical record and report date of birth, age at diagnosis and race to the PCR, which are then categorized centrally. We collected the same municipality-level data on breast cancer-related mortalities from 2011 to 2015. We also obtained population demographics for each Pennsylvania municipality from the 2010 United States Census data including population composition as it related to age, gender, race and median income. We combined both data sets for analysis and grouped municipalities by county for statistical analysis.

Statistical analysis

Data were analyzed in Stata/MP 15.1 (StataCorp LP., Texas, USA). All tests were two-sided and statistical significance was assessed at the 0.05 level. Variables were classified as person-level (age group, race, breast cancer diagnosis rate, and breast cancer mortality) or county-level (population size, median household income, median female age, and percent of white residents). Categorical variables were summarized as frequency (percentage) and continuous variables were summarized as median (IQR: interquartile range). Continuous variables were assessed for normality using histograms. Relationships between two continuous variables were investigated using scatterplots and correlation coefficients (r) were estimated. Person-level and county-level distributions were compared between (1) women with vs. without a breast cancer diagnosis, and (2) women who died due to breast cancer vs. women who were alive at the time of data collection. Categorical and continuous distributions were compared between two groups using a Chi-square test of Independence or Wilcoxon rank-sum test, respectively.

Breast cancer diagnosis rates were calculated by summing the total number of diagnoses in the population at risk and dividing by the total number of women in the population at risk. The final product was multiplied by 100 to represent a percentage. The same calculation was performed for the mortality rate that is, summing the number of breast cancer-related deaths in the population at risk and dividing by the total number of women in the population at risk. Age-specific breast cancer diagnosis and mortality rates and 95% confidence intervals (95% CI) were estimated for each race using the direct standardization method, with the 2010 United States Census data serving as the reference population.

Due to the multilevel nature of the data, mixed effects logistic regression (ME LR) models were used to test the association between person- and county-level variables with breast cancer diagnosis and mortality, respectively. County was treated as a random effect, allowing random intercepts for each county. For both diagnosis and mortality, the null model was compared to the ME LR model without any predictors, and the models were compared using a likelihood ratio test (LRT). The LRTs for diagnosis (χ2 (1 d.f.) = 382.43, p ≤ 0.001) and mortality (χ2 = 28.13, p ≤ 0.001) indicated that random intercept models were appropriate. County-level variables were centered around their grand means (population size: 565,367, median household income: $52,866, percent white: 82.7%, median female age: 41.1 years).

Univariate ME LR models were built for diagnosis and mortality, respectively. Odds ratios (OR) and 95% CIs are presented. Full, multivariable ME LR models were built for both outcomes using all person- and county-level variables. Full models were reduced using backwards elimination until all independent variables were significant at the 0.05 level. Two- and three-way interactions were tested for any independent variables that remained in the backwards-reduced models, and only significant interactions remained in the models. Significant interactions are described by predicted probabilities and differences in predicted probabilities (with SE: standard errors), and p values were adjusted using a Bonferroni procedure to account for multiple comparisons.

Results

Demographics

From the 2010 US Census, a total of 5,353,875 adult females living in 2574 municipalities and 67 counties were recorded in the state of Pennsylvania. The majority (84.3%) identified as White with only 10.4% identifying as Black and 5.4% identifying as another race or unknown race (Table 1). The state-wide median household income was $45,667 (IQR: $40,940–$51,122). The distribution throughout the state of population density, median income and proportion of individuals identifying as non-White are shown in Fig. 1. As expected, higher population density correlated with a larger proportion of non-White individuals (r = 0.78, p ≤ 0.001) and higher median household incomes (r = 0.41, p ≤ 0.001), though groups of poor, White individuals can be seen throughout the state in rural areas. The Philadelphia suburbs in the southeast part of the state have both larger diversity and higher median incomes.

Table 1 Distribution of person-level and county-level demographics in analysis data set from the Pennsylvania cancer registry and US census
Fig. 1
figure 1

Demographics by municipality for the state of Pennsylvania as derived from the Pennsylvania Cancer Registry and the 2010 US Census including A population density, B percent of the population identifying as Black or another race, C median house-hold income, D breast cancer diagnosis and E breast cancer mortality

Breast cancer diagnosis

From the Pennsylvania Cancer Registry, we identified 54,038 incident breast cancer diagnoses from 2011 to 2016. Overall, the calculated breast cancer diagnosis rate was 1.01% (95% CI 1.00–1.02%), the geographical distribution of which is shown in Fig. 1. County-level breast cancer diagnosis rates ranged from 0.80% (Mifflin County) to 1.60% (Sullivan County), with a median of 1.00% (IQR: 0.94–1.09%). White women had the highest unadjusted diagnosis rate (1.06%), followed by Black women (0.88%) and women of other or unknown race (0.43%) (Table 2). Breast cancer diagnosis rates increased as age increased, with the highest among those aged 65–74 (2.61%). Diagnosis rates increased as age increased for each race group (Fig. 2A). However, age distributions were significantly different across race in the general population of PA residents (p ≤ 0.001), as 55.2% of Black women in PA were under 45 years old compared to 43.4% of white women. After adjusting for age, the diagnosis rates were slightly higher among Black women (1.06%, 95% CI 1.03–1.09%) than white women (1.02%, 95% CI 1.01–1.02%) (Supplemental Table 1).

Table 2 Distribution of person-level and census-level demographics by breast cancer diagnosis and mortality
Fig. 2
figure 2

Unadjusted rates of breast cancer diagnosis and mortality stratified by race and age group

Table 3 displays the univariate ME LR analyses for breast cancer diagnosis. Age group, race, and county’s median female age were all significantly associated with incident diagnosis (all p values < 0.001). All person-level and county-level variables were entered into the multivariable ME LR model, and age, race, county’s median household income, and the interactions between race and age, and race and household income were significant in the final backwards-reduced model (Table 4). The effect of race differed significantly by age group and county’s median household income.

Table 3 Univariate multilevel logistic regression models for breast cancer diagnosis and mortality
Table 4 Backwards-reduced multilevel logistic regression models for breast cancer diagnosis and mortality

Although the probability of an incident diagnosis increased as age increased, women of other or unknown races had a significantly lower predicted probability of an incident diagnosis compared to their white and Black counterparts for all age categories except for the youngest (15–29) and oldest (75 +) age categories (Fig. 3A). The largest difference occurred among women aged 55–64 and 65–74. For example, among those aged 55–64, 0.98% of women of other or unknown races were predicted to have an incident diagnosis compared to 1.7% of white (SE of difference with other or unknown race: 0.06%, adjusted p ≤ 0.001) and 1.9% of Black (SE of difference with other or unknown race: 0.08%, adjusted p ≤ 0.001) women, respectively. There were no significant differences between white and Black women across all age groups in the multivariable model.

Fig. 3
figure 3

Predicted probability of A breast cancer diagnosis and B breast cancer mortality stratified by race and plotted over age group

Diagnosis rate also increased as county’s median household income increased, however this increase was of a larger magnitude for women of other or unknown race (Fig. 4A). Except for the highest household income level, women of other or unknown races had a significantly lower predicted probability of an incident diagnosis compared to white and Black women, respectively. The largest difference occurred among women who lived in counties with the lowest household income levels. Specifically, 0.51% of women of other or unknown races who live in counties with a median household income of $33,000 were expected to have a diagnosis of breast cancer. These rates were significantly higher in white (0.91%, SE of difference with other or unknown race: 0.03%, adjusted p ≤ 0.001) and Black (0.97%, SE of difference with other or unknown race: 0.04%, adjusted p ≤ 0.001) women, respectively. There were no significant differences between white and Black women across all levels of household income in the multivariable model.

Fig. 4
figure 4

Predicted probability of breast cancer diagnosis stratified by race and plotted over median household income

Breast cancer mortality

There were 9828 recorded breast cancer deaths from 2011 to 2016 for an overall breast cancer mortality rate of 0.18% (95% CI 0.18–0.19%). County-level breast cancer mortality rates ranged from 0.11% (Centre county) to 0.32% (Warren county), with a median of 0.18% (IQR: 0.16–0.20%). Breast cancer mortality increased as age increased, and the highest mortality rate was among women aged 85 and over (1.03%) (Table 2). The 75–84 age group had the second highest mortality rate, but it was half that of the 85 and over age group (0.51%). In the unadjusted analysis, Black women had the highest rate of mortality (0.21%), followed by white women (0.19%) and women of other or unknown races (0.08%). Black women still had the highest mortality rate, even after adjusting for the differences in age distributions. The age-adjusted mortality rate was 0.27% (95% CI 0.26–0.29%) in Black women, 0.18% (0.17–0.18%) in white women, and 0.15% (95% CI 0.13–0.17%) in women of other/unknown races (Supplemental Table 1).

Age, race, and county’s median female age were significantly associated with breast cancer mortality in the univariate ME LR models (all p values < 0.001; Table 3). Age, race, median household income, and the interaction between age and race were significant in the backwards-reduced multivariable model (Table 4). The interaction between race and household income was not significant, and therefore removed from the model. Women who live in counties with higher median household incomes were significantly more likely to have a breast cancer death (adjusted OR: 1.03, 95% CI 1.00–1.05, p = 0.02). The effect of race on breast cancer mortality depended on age group. There were no significant differences in mortality rates across race for women under the age of 35, however from age 35 onward, Black women had a significantly higher breast cancer mortality rate compared to white women (Fig. 3B). The same relationship was observed between Black women and women of other or unknown races starting at age 45. The largest differences occurred in women aged 75 and older. In the 75–84 age group, the mortality rate was 0.79% among Black women, 0.50% among white women (SE of difference with Black women: 0.06%, adjusted p < 0.001), and 0.32% among women of another or unknown race (SE of difference with Black women: 0.09%, adjusted p < 0.001). Among women aged 85 and over, these rates were 1.4% in Black women, 1.0% in white women (SE of difference with Black women: 0.12%, adjusted p ≤ 0.001), and 0.77% in women of other or unknown races (SE of difference in Black women: 0.22%, adjusted p = 0.048). There were no significant differences in breast cancer mortality rates between white women and women of other or unknown races across all age groups.

Discussion

This analysis of breast cancer diagnosis and mortality rates in Pennsylvania demonstrates ongoing socioeconomic and racial disparities. As expected, there is significant diversity within the state as it relates to population density, race and income. The analysis cohort therefore included both low- and high-income individuals of all races and provides insight into the interaction between these important socioeconomic factors.

This analysis yielded a median breast cancer diagnosis rate of 1.01%, or approximately 0.2% per year, which is similar to the 0.13% annual new case rate reported in SEER [19]. Consistent with most analyses of race and socioeconomic factors, we found that White race and higher income were associated with higher diagnosis rates in the unadjusted analyses [20,21,22]. The age-adjusted analyses, which demonstrated a slightly higher rate of diagnosis for Black women, is also consistent with reports by DeSantis, et al., that demonstrated an increasing incidence for Black women and ultimate convergence of incidence rates with white women in 2012 [2, 4]. Overall diagnosis rates did not differ when counties were compared according to population size, perhaps indicating a stabilization of rural–urban differences documented by Boukovalis and Sariego [6]. Race did moderate the adjusted risk of a breast cancer diagnosis both for age and income. We found that predicted diagnosis rate did not differ between Black and white women for all income quartiles but was significantly lower for women of unknown or other races. This finding highlights the parity of women regardless of race among income groups, and likely represents increased access to breast cancer screening for affluent women.

In our multivariable analyses, we saw that despite having similar rates of diagnosis, Black women experienced higher rates of mortality compared to white women. This was expected based on previous studies which demonstrate that Black women tend to be diagnosed at a later stage than other women which contributes to increased mortality [10, 23, 24]. Black women and those of a low socioeconomic status are also diagnosed with triple negative breast cancer, which has the worst prognosis, at higher rates [16], and experience longer treatment delays or less aggressive therapies [23,24,25], exacerbating these unfavorable tumor factors. Interestingly, there was no interaction between race and income with respect to breast cancer mortality indicating that more affluent Black women do not have a survival advantage over those with lower incomes. Singh, et al., came to a similar conclusion when analyzing six decades of United States data, finding that when grouping women by neighborhood socioeconomic variables, Black women had an approximately 50% higher breast cancer mortality than white women [26]. Synthesizing our results with those of Bradley, et al., who demonstrated no mortality difference by race in the Detroit SEER database when adjusted for clinical and non-clinical factors, there is a complex interplay between biological and socioeconomic factors, including race, that modulates the risk of breast cancer mortality.

Several authors have postulated an impact of residential segregation, defined as homogeneity of racial groups according geographic area, on health outcomes [27]. Mixed effects are seen among breast cancer patients in California neighborhoods with high residential segregation with lower mortality for Black patients but higher mortality for white patients [28], while Black and Hispanic segregation in Texas is associated with higher mortality. We found no evidence of an impact of residential segregation on the county level in Pennsylvania on either breast cancer diagnosis rate or mortality.

Our study’s strengths include the use of a prospectively maintained, statewide database with mandatory contribution that allows for the broad analysis of racial and socioeconomically disparate communities. One limitation of the study is that patients are not tracked longitudinally, and the data instead supplied an annual population-based snapshot of diagnosis and death. Another major limitation of this study is the lack of tumor and treatment information, as these variables are closely linked with outcomes and have been shown to differ when patients are analyzed according to race and socioeconomic variables [29]. If, as reported, Black patients and those of lower socioeconomic status present with tumor subtypes with a worse prognosis and receive less effective and more poorly timed treatment than their counterparts, our analysis may overestimate the independent impact of SES and race on outcomes. Importantly, however, socioeconomic variables may in fact contribute more to poor outcomes. A recent multivariable NCDB analysis of nonmetastatic breast cancers demonstrates that insurance and medical comorbidities accounted for 48.3% combined excess risk of death for Black patients whereas tumor characteristics accounted for only 23.3% of the excess mortality [18]. Therefore, regional socioeconomic analyses such as ours are important in documenting the range and magnitude of social factors and race on differences in breast cancer incidence and mortality in order to develop targeted strategies to combat them. Ongoing targeted efforts in place that provide education, increase access to breast cancer screening, and assist in better access to treatment once diagnosed should continue and be bolstered by innovative strategies to eliminate socioeconomic disparities in breast cancer care.