Introduction

Breast cancer is the most common cancer in American women and one of the leading causes of cancer death [1]; however, substantial disparities exist in mortality by race, socioeconomic status, and neighborhood-level characteristics [2, 3]. While emerging research supports the role of individual and area-level social determinants in breast cancer outcomes, little information is known regarding potential biological mechanisms underlying these associations. Social epigenomics is a growing field that examines how social and environmental experiences may impact the epigenome through histone modification, telomere shortening, and DNA methylation [4]. There is emerging literature that adverse social environments may disproportionately affect minority health through epigenomic perturbations, which could impact disparities in health outcomes [4]. Previous studies have found an association between individual-level SES and DNA methylation among children and adults [5,6,7,8,9,10,11]; and at least one study suggests that racial differences in methylation may be mediated, in part, through disparities in childhood SES [12]. This latter point illustrates that the hypothesized underlying mechanism is not biological differences by race, but racial disparities in the lived environment that may drive epigenetic perturbations. Area-level characteristics are an important consideration in disparities research given its multidimensional constituents and vast biologic milieu. For example, even after accounting for individual-level socioeconomic factors, living in a disadvantaged neighborhood is associated with poor health outcomes [12]. Additionally, chronic stress due to unfavorable neighborhood conditions can result in dysregulation of inflammatory and stress reactivity pathways which may, in part, be driven by epigenetic modifications such as DNA methylation [13].

Epigenetic modifications, which result in variation to gene expression without altering the underlying DNA sequence, have been established as biomarkers that can signify environmental factors as potential drivers of disease. DNA methylation, specifically, has become a primary epigenetic mechanism to study due to its influence on gene expression and its responsiveness to lifestyle exposures [14]. Aberrant DNA methylation can result in increased oncogene expression and tumor suppressor gene silencing; a common occurrence in breast carcinogenesis [14, 15]. Exploring the impact of area-level sociodemographic characteristics on methylation may yield mechanistic insight into the role of social stressors on breast cancer progression and mortality disparities.

To date, no study has used an epigenome-wide approach to assess neighborhood-associated methylation and breast cancer prognosis. Thus, the purpose of this study was to conduct an epigenome-wide association study (EWAS) of breast tumor tissue to identify CpG sites associated with several neighborhood-level factors. Additionally, we explored interaction by race and downstream associations with breast cancer prognosis.

Methods

Study population

Study protocol follows the methodology described in Do et al. [16]. Briefly, fresh tumor specimen and clinical data were collected from patients receiving surgery at three metro-Atlanta area hospitals (Emory University Hospital, Emory University Hospital Midtown, and Grady Memorial Hospital). We included 99 non-Hispanic White (NHW) and non-Hispanic Black (NHB) women diagnosed with breast cancer between 2008 and 2017 in this analysis. Eligibility for inclusion included women who were at least 21 years of age, self-reported NHW or NHB, diagnosed with a first-primary stage I, II, or III breast cancer, and received surgery at one of the aforementioned hospitals. Women who were previously diagnosed with breast cancer or did not have a fresh tissue specimen were excluded from this study.

Data collection

Clinical records of women who underwent surgery provided covariate data, including age at diagnosis, race, self-reported smoking status, educational attainment, family history of breast cancer, and zip code data. Tumor characteristics were also obtained from clinical records including estrogen receptor (ER) status, human epidermal growth factor 2 (HER2) status, and progesterone receptor (PR) status; tumor grade; receipt of chemo, radiation, and endocrine therapy; and comorbidities at diagnosis. Updated vital status (through 2/15/2018) was obtained by linking all women to the Georgia Cancer Registry, and cause-specific death abstracted. As described in Do et al. [16], we considered all-cause mortality as our outcome of interest. Given the short follow-up period (median = 3 years) any mortality event would be, in part, driven by underlying breast cancer [17].

Neighborhood characteristics were collected from the Opportunity Atlas, a publicly available atlas of anonymized longitudinal data for nearly every neighborhood and census tract in the USA [18]. Our primary exposures included median rent (2012–2016), job growth rate (2004–2013), median household income (2012–2016), poverty rate (2012–2016), fraction college graduates (2012–2016), fraction non-white (2010), fraction single parents (2012–2016), population density (2010), and job density (2013). Neighborhood values for each primary exposure are estimated averages over the specified time-period. Area-level data were collected from three Census Bureau data sources: the Census 2000 and 2010 short forms; federal income tax returns in 1989, 1994, 1995, and 1998–2015; and the Census 2000 long form and the 2005–2015 American Community Surveys (ACS). For the purposes of this paper, the use of the Opportunity Atlas is akin to obtaining neighborhood characteristics data from the aforementioned sources and are reflective of the years during which data were collected.

Methylation data

Fresh tumor specimens from breast cancer patients were obtained from the Breast Satellite Tissue Bank, Winship Cancer Institute, Emory University, Atlanta, GA, USA. DNA methylation was measured in 99 breast tumor tissue samples using the Illumina Infinium MethylationEPIC Beadchip (Illumina, San Diego, CA, USA). Methylation assays were performed in accordance with the Infinium HD Methylation Assay protocol. The protocol uses bisulfite treatment of DNA to convert unmethylated cytosines to uracil, allowing identification of methylated versus unmethylated loci. Two site-specific probes then bind to loci-flanking methylated or unmethylated sequences. The fluorescent signal from the methylated probe (M) relative to the total signal of methylated (M) and unmethylated (U) probes combined is the proportion of DNA strands that are methylated for that CpG site [19]. The β-value represents this: β = [M/(M + U)]. The β-value ranges from 0 to 1, where 1 represents 100% of the cells being methylated at a CpG site. Three samples were removed during pre-processing due to poor performance.

Quality control (QC) was conducted on the data using the CpGassoc package in R. Data points with detection p-values > 0.001 or with low signal were set to missing, and CpG sites with missing values (2869) in over 10% of the samples were removed from the dataset. A stricter probe filtering, as suggested by Zhou et al. [19] was employed by filtering out CpG sites including the following: (1) probes with low quality or inconsistent mapping, (2) probes with extension bases inconsistent with the specified color channel or CpG based on mapping, (3) probes with non-unique 30 base pair 3’-subsequences, (4) probes with SNPs in the extension base that causes a color channel switch, and (5) probes with 5 base pair 3’-subsequences that overlap with any SNPs with a global minor allele frequency > 1%. After QC, 758,942 CpG sites remained for evaluation with neighborhood characteristics.

Statistical analysis

Analyses were carried out using R (www.r-project.org/). Neighborhood characteristics considered in this study are reported overall and by race as means and corresponding standard deviations.

Linear regression models were used to assess whether individual mean β-values differed due to neighborhood characteristics, adjusting for model-specific covariates based on a priori knowledge of the literature and causal diagrams. [20,21,22]. Using the CpGassoc package we regressed β-values for each CpG site on neighborhood-level factors, adjusting for age, race, and smoking status. Additionally, models included a fixed effect for each BeadChip to account for potential chip-to-chip differences in measurement and to adjust for batch effects. Independent regression models were performed for all nine of the neighborhood characteristics of interest and statistical significance defined as a false discovery rate (FDR) q-value < 0.05.

To assess whether the association between neighborhood characteristics and tumor methylation was modified by race, the CpG sites reaching FDR significance in the primary EWAS were tested for interaction. For each of these CpG sites, the β-values were regressed on the identified neighborhood characteristic with an interaction between the neighborhood characteristic and race. All interaction analyses were adjusted for age and chip position, and significance defined as FDR < 0.05.

We used multivariable Cox proportional-hazards models to explore associations between the top neighborhood-associated CpG sites and all-cause mortality, adjusting for (1) age, (2) age and race, and (3) age, race, cancer stage, and ER status.

Results

Demographic characteristics of our study population were reported in Do et al. [16] and are provided in Supplementary Table 1. Briefly, NHB women were, on average, older (mean age = 58 vs 50 years, respectively) and had a higher BMI compared to NHW women (mean BMI = 34.64 vs 29.84 kg/m2, respectively). No differences between NHB and NHW women with regards to ER negative status (22.9% vs 30.8%, respectively). Neighborhood characteristics, overall and by race, are presented in Table 1. In total, 55 different metro-Atlanta neighborhoods were represented in this analysis, with an average of 1.75 women per neighborhood (values ranging from 1 to 7 women per neighborhood). On average, NHB women resided in neighborhoods with lower rent prices, higher poverty rates, and lower median household incomes compared to neighborhoods of NHW women. Additionally, neighborhoods of NHB women had a greater proportion of non-white and single parent households while having a lower proportion of college-educated households compared to neighborhoods represented by NHW study participants. NHB women also lived in neighborhoods that had a substantially lower job density compared to NHW women (1327 jobs/mi2 vs 2784 jobs/mi2).

Table 1 Means (standard deviation) are presented for neighborhood-level factors of the neighborhoods of residence, stratified by race

In our main effect analysis assessing epigenome-wide association of DNA methylation with the 9 neighborhood-level factors, 26 CpG sites passed the a priori FDR threshold of 0.05. Of these 26 sites, 5 were associated with neighborhood college graduation rates, and 21 with neighborhood job density. These sites are listed in Table 2. Manhattan plots show the distribution of the CpG sites by − log10(p-value) and chromosomal location for college graduation rates and neighborhood job density. (Fig. 1A, B).

Table 2 The 26 FDR-significant CpG sites associated with a neighborhood-level factor in breast tumor tissue in EWAS
Fig. 1
figure 1

Manhattan plots of CpG sites for A neighborhood college graduation rates and B neighborhood job density. FDR < 0.05 CpG sites are those above the solid black line

In Table 3 we provide the results of our interaction analysis, to explore race-specific differences in the 26 CpG sites that were associated with neighborhood characteristics. There were three CpG sites (cg00730549, cg00950813, cg02449575) where the relationship between job density and DNA methylation differed substantially by race (Fig. 2A–C), and one CpG site (cg22544350), where the relationship between college graduation rates and DNA methylation was differential by race (Fig. 2D), although these results were not robust after correcting for 26 comparisons.

Table 3 Interaction assessment between race and neighborhood-level factor of the 26 FDR-significant neighborhood-level factor-associated CpG sites
Fig. 2
figure 2

Scatterplots and regression lines depicting β-values by neighborhood-level factors excluding methylation outliers, examining CpG sites that exhibited interaction by race at p < 0.05

Of the 26 CpG sites associated with neighborhood characteristics, we found that eleven were also associated with prognosis (Table 4); most resulting in a weak reduction in overall mortality. These reductions were attenuated in multivariable models accounting for age, race, clinical stage, and ER status. For two CpG sites (cg15375883 and cg15196042) we observed a modest increase in mortality risk, even in multivariable models. After correcting for multiple comparisons using false discovery rate adjustment, cg08214329 (IFT140/TMEM204) was associated with prognosis (FDR q-value = 0.02). The hazard ratio (HR) and corresponding 95% confidence interval (CI) in the fully-adjusted model was HR = 0.93 (95%CI 0.87, 0.98).

Table 4 Hazard ratios models of the 26 FDR-significant neighborhood-level factor-associated CpG sites adjusted for (1) age, (2) age and BMI, and (3) age, clinical stage, and ER negative status

Discussion

This study is the first untargeted analysis to examine neighborhood-level factor-associated methylation in breast tumor tissue using the EPIC array. DNA methylation emerged as significantly associated with neighborhood characteristics at 26 CpG sites: 21 with neighborhood job density and 5 with neighborhood college graduation rates. Neighborhood-level factors and differential DNA methylation have been limitedly assessed in the literature. Previous studies have found that DNA methylation may mediate the association between neighborhood-level factors and adverse health outcomes such as inflammation and depression [23,24,25,26]. In a subsample of the Multi-ethnic Study of Atherosclerosis (MESA) from 2010 to 2012, neighborhood-level socioeconomic disadvantage and neighborhood social environment were associated with methylation in stress- and inflammation-related genes linked to metabolic disease such as obesity and type-2 diabetes [23]. Neighborhood crime rates were associated with depression, mediated by methylation of the promoter region of the serotonin transporter gene (5-HTT), in a cohort of 99 African American women from the Family and Community Health Study [24]. No study to date has linked neighborhood factors to breast tumor methylation and prognosis.

We found that area-level job density and college graduation rates associate with differential methylation in breast tumors. Job density may reflect several relevant exposures with varied impacts on health, including surface transportation, which has been associated with outdoor air pollution and differential DNA methylation [27]. It could also be a proxy for residential development, access to green space and health care which are associated with positive health outcomes [28]. Though none of these exposures have been examined in the context of DNA methylation. College graduation rates could also proxy for several exposures including educational attainment, specific occupations, social interactions, or engagement in healthy behaviors [31]. While college graduation may represent economic prosperity, our study found no association with median household income. Low educational attainment has been linked to differential methylation [29] and increases in mortality [30, 31], though the mechanisms underlying this association are not well understood. Shiftwork [34] and manufacturing occupations [35, 36] have also been shown to adversely impact the DNA methylome, and some studies report methylation as a product of occupational exposures may increase the risk of various cancers, including esophageal [37], lung [38], and bladder carcinomas [39]. While job density and college graduate rates may have several mechanisms of action leading to differential methylation, including those related to stress [32], this study provides initial evidence to support the important role they may play in influencing the breast tumor methylome and potentially adverse outcomes in response to methylation.

While many of the CpGs modulated in response to neighborhood characteristics are within the transcription start site or body of genes that have been implicated in carcinogenesis [40,41,42,43,44], here we highlight two. A single probe on Chr 16 (cg08214329) with overlapping promoters for IFT140 and TMEM204 was associated with mortality among women with breast cancer. TMEM204 is expressed in all cancers with low specificity (Median range 3.1–20.5 FPKM, RNAseq TCGA data), and low expression has been associated with unfavorable liver cancer outcomes, but favorable survival in melanoma (https://www.proteinatlas.org/ENSG00000131634-TMEM204/pathology), consistent with our findings. We observed improved breast cancer survival (~ 7% reduction in mortality) with every 1-unit increase in methylation.

Of the 26 probes identified as being modulated by neighborhood factors, only one (cg00950813, Chr 7) interacted with race and was associated with all-cause mortality. cg00950813 is located in the body of ZNF282 and encodes the zinc finger protein 282 (ZNF282). The zinc finger proteins are involved in a variety of cellular mechanisms and have been implicated in tumorigenesis and cancer progression [45]. ZNF282 has been found to have an oncogenic role and promote tumorigenesis in esophageal and breast tissue [46, 47]. ZFP282 is a co-activator of estrogen receptor α and thus is required for estrogen-dependent breast cancer cell growth [50], suggesting differential expression by breast cancer subtype [48]. In addition, ZNF282 has a small ubiquitin-like modifying (SUMO) role in estrogen signaling and breast tumorigenesis in mouse models and the SUMO pathway is hyperactivated in breast cancer [47]. Given the important role of ZNF282 in influencing ER positivity, we examined whether methylation at this site was associated with ER status, adjusting for age, race, and chip. In our post hoc case-case comparison we found that every 0.01 unit increase in methylation was associated with an 11% increased odds of ER positive status compared to ER negative status (Odds Ratio 1.11; 95%CI 1.04, 1.24).

In this site, there was an inverse association between methylation and job density, which appears to be primarily driven by NHW women. The observed interaction by race is unlikely a result of differential ER positivity (77% ER positivity among NHB vs. 69% among NHW), but due to lower methylation among three NHW women living in communities with substantial job density. It is unknown whether this interaction would remain had our study included NHB women residing in areas with high job density, and upon removing the three NHW women with job density > 5000 jobs/mi2, the interaction did not persist. Our findings underscore the potential importance of sociodemographic characteristics in observed molecular differences by race.

We also found an inverse association between methylation at cg00950813 and mortality; each 1% increase in the β-value was associated with a 5% decrease in all-cause mortality; consistent with the putative role of ZNF282 in breast carcinogenesis. It is well established that ER status plays a critical role in breast cancer prognosis, where women with ER negative tumors have poorer outcomes compared to women with ER positive tumors [49], yet we found no interaction between cg00950813 and ER with mortality, and little modulation of its effect in multivariable models adjusting for ER status. Given that this site is located in the gene body, concluding the impact on gene expression is not straightforward. Gene body methylation is frequently cited as an indicator of an active gene [50]. However, a recent large scale EWAS has shown that regardless of the location in the gene, methylation is primarily associated with decreased expression [51]. Given the oncogenic role of ZNF282, our findings would align with this—such that increased methylation would lead to decreased gene expressed thereby reducing all-cause mortality.

The limitations of our study should be acknowledged. While we included only 96 women in this study, we were able to identify area-level differential methylation at several CpG sites even after correcting for multiple comparisons. While the effect sizes may not be robust, the goal of this research was to understand whether the DNA methylome may be one mechanism by which the social environment influences carcinogenesis. The ubiquity of exposure may suggest that population-level strategies to improve adverse neighborhood characteristics may have a large impact on cancer outcomes, particularly for vulnerable groups. More robust studies are needed to validate our findings. Our study population may not reflect the known subtype distribution among non-Hispanic White women who are infrequently diagnosed with triple-negative disease. While this limits external validity, internal validity is enhanced by having a relatively balanced distribution of subtypes by race. We performed 9 different EWAS in total, each independently FDR corrected. While this approach is less conservative, given the novelty of our study hypothesis we aimed to reduce the likelihood of false negatives and inform future work in this area. Further, many of our exposures were correlated, and all 26 significant CPGs were further evaluated for associations with all-cause mortality in multivariable analyses again FDR adjusted. Additionally, based on our a priori criteria, we examined interaction by race for only those CpG sites that reached FDR significance in the primary EWAS. It is possible that significant interaction by race may exist in other CpG sites that did not reach FDR significance and is an important consideration for future studies. Confounding by unmeasured factors may also potentially influence our results, although we anticipate most factors would be downstream of neighborhood characteristics and thus on the causal pathway. Finally, we focus exclusively on the DNA methylome, upstream from our outcome of interest. In doing so, we negate other layers of the epigenome and downstream impacts on gene expression.

This study took an untargeted approach to examine potential epigenetic mechanisms underlying the association between neighborhood-level factors and breast cancer prognosis in a diverse population of women that underwent breast cancer surgery in metro-Atlanta. Our preliminary results suggest that the behavioral and biological responses consequent to neighborhood stressors have the potential to modulate the breast tumor epigenome and subsequent outcomes. Our results require replication in other race and ethnic groups in diverse regions across the USA. Importantly, this research highlights that if the characteristics perpetuating poor outcomes in minority communities are alleviated, outcomes can also be improved. Future efforts should focus on replicating the methodology in a larger, equally diverse patient population to validate these preliminary findings and further explore the epigenetic mechanisms identified in this study.