Introduction

Breast cancer is the leading cause of cancer death in women; however, substantial heterogeneity exists in terms of prognosis by race [1], estrogen receptor (ER) status [2], and body size [3]. As obesity is a potentially modifiable prognostic factor, it is critical to understand its role in disparate breast cancer outcomes. Studies have reported conflicting associations between obesity and breast cancer, depending on various factors including hormone receptor status, race, menopausal status, and hormone therapy use [3]. For example, among pre-menopausal women, obesity has been associated with reduced risk of breast cancer in some, but not all, studies [3,4,5,6], whereas among post-menopausal women, obesity is consistently associated with an increased risk of breast cancer [7,8,9,10], although the association may vary by intrinsic subtype. Following a diagnosis of breast cancer, obesity is associated with poor outcomes irrespective of menopausal status or tumor characteristics [11,12,13,14] which may suggest that multiple mechanisms are involved across the carcinogenic pathway. Given the rising rates of obesity in cancer patients [3, 15], and disproportionate burden of obesity among minority women [16, 17], it is important to identify the underlying mechanisms driving the association between body size and breast cancer prognosis.

Epigenetic mechanisms are functionally relevant changes to the genome that do not involve a change to the underlying DNA sequence and are a promising biomarker that can comprehensively capture the result of both genetic and environmental influences. DNA methylation has become a promising target for assessing the etiology and progression of cancers due its malleability following environmental and lifestyle exposures and influence on gene expression [18]. Cancer cells are known to display aberrant methylation patterns leading to the silencing of tumor suppressor genes and increased expression of oncogenes [18, 19]. As a potential modulator of exposure-outcome relationships, uncovering obesity-related perturbations in tumor tissue may shed light on the pathophysiology of cancer progression and outcomes, informing behavioral and pharmacologic targets for intervention.

While the relationship between obesity and breast cancer prognosis has been examined using candidate gene approaches, no study to date has used a genome-wide approach to assess obesity-associated methylation signatures and outcomes in the tumor tissue of women diagnosed with a first-primary breast cancer. Thus, the purpose of this study was to conduct an epigenome-wide association study (EWAS) within breast tumor tissue to identify unknown CpG sites associated with obesity. We further explored modification by race, ER status, and downstream associations with breast cancer prognosis.

Methods

Study population

The Glenn Family Breast Satellite Tissue Bank at the Winship Cancer Institute of Emory University has a history of collecting clinical data and storing fresh tumor specimens from patients receiving surgery at three local hospitals in the metro-Atlanta area (Emory University Hospital, Emory University Hospital Midtown, and Grady Memorial Hospital). We used stratified sampling to identify 99 non-Hispanic Black (NHB) and non-Hispanic White (NHW) women diagnosed between 2008 and 2017 who were ideal weight, overweight, or obese. This sampling strategy was implemented to enhance our ability to detect obesity-associated tumor methylation and identify potential modification by race. Women were eligible for inclusion if they were 21 years of age or older, NHB or NHW by self-report, diagnosed with a first-primary stage I–III breast cancer, and received surgery at one of the participating hospitals. Women with a previous diagnosis of breast cancer or without an available fresh tissue specimen were excluded.

Data collection

Anthropometric and covariate data were obtained from the clinical records of women undergoing surgery. The primary exposure, body mass index (BMI, kg/m2), was derived from body weight and height, obtained at the time of the diagnosis. Age at diagnosis, family history of breast cancer, race, educational attainment, history of pregnancy, menarche and breast feeding, hormone replacement therapy use, and self-reported smoking status (current/former/never) were similarly abstracted from the clinical record. Clinical characteristics obtained from the record included: ER, progesterone receptor (PR), and human epidermal growth factor 2 (HER2)-status; tumor grade; receipt of chemotherapy, endocrine therapy, and radiation; comorbidities; as well as breast cancer and all-cause mortality. Poor prognosis was defined as all-cause mortality (including mortality from breast cancer). Given the short follow-up period (10 years), we anticipate that any mortality would be driven, in part, by underlying breast cancer [20].

Methylation data

DNA methylation was measured in 99 breast tumor tissue samples using the Illumina Infinium MethylationEPIC Beadchip (Illumina, San Diego, CA, USA). Methylation assays were performed in accordance with the Infinium HD Methylation Assay protocol. The protocol uses bisulfite treatment of the DNA to convert unmethylated cytosine to uracil, allowing identification of methylated vs. unmethylated loci. Following bisulfite treatment, two site-specific probes bind to sequences flanking methylated and unmethylated loci. The fluorescent signal from the methylated probe relative to the total signal for both methylated and unmethylated probes represents the proportions of DNA strands that are methylated at the CpG site [21]. This is described by the β-value, the proportion of probes methylated (M) divided by the combined methylated (M) and unmethylated (U) probes (β = M/[M + U]). The β-value is measured on a scale from 0–1, where 1 indicates 100% of the cells were methylated at the CpG site. Three samples were removed during pre-processing due to poor performance.

Quality control (QC) was conducted on the resulting data using the CpGassoc package [22]. Data points with low signal or detection p-values > 0.001 were set to missing, and 2869 CpG sites with missing values in > 10% of the samples were removed. All final samples had non-missing data for ≥ 95% of CpG sites, so no samples were removed. We additionally filtered out probes per general masking recommendations from Zhou et al. including probes with SNPs at a minor allele frequency > 1%, non-unique hybridization, probes with inconsistent mapping quality, probes with non-unique sub-sequence, and probes with a SNP that causes a color channel switch from the official annotation [23]. After QC, 759,156 CpG sites were evaluated.

Statistical analysis

Analyses were carried out using R (www.r-project.org/). Demographic characteristics were reported as frequencies or means. Mean methylation, as defined by the individual combined methylation β-value across the genome, was examined using linear regression models to assess whether individual mean β-values differed according to BMI, race, ER status, or poor prognosis, adjusting for model-specific covariates based on a-priori knowledge of the literature and causal graphical analyses [24, 25].

For the EWAS, the CpGassoc package was used to fit a linear regression model for each CpG site. Each regression modeled methylation and BMI as the primary predictor, adjusting for age, race, smoking status, and chip position. To account for potential chip-to-chip differences in measurement and to adjust for batch effects, a fixed effect for each BeadChip was included in all models. Significance was defined as a false discovery rate (FDR) q-value < 0.05. Since the β-values can display non-normal distribution, violating our assumption for linear regression, we examined the residuals of a select few CpG sites identified in the discovery and interaction analyses described below.

To assess whether the relationship between BMI and tumor methylation was modified by race or ER status (positive or negative), the top 20 CpG sites identified in the primary analysis were tested for interaction. For each of these CpG sites, we regressed the β-values on BMI with an interaction between BMI and race or ER status, respectively. We were underpowered to further explore intrinsic subtypes of breast cancer (ER, PR, HER2 status). All interaction analyses were adjusted for age, race (only in the ER interaction model), and chip position. Statistical significance was set at p < 0.05.

To examine whether BMI-associated methylation was associated with mortality, multivariable Cox-proportional hazard ratios (HR) were used to examine associations between the top 20 CpG sites and all-cause mortality. Stage, ER status, and treatment characteristics were considered as potential covariates in the model but were ultimately excluded—as causal graphical analyses suggested that they were likely mediators of the exposure-outcome relationship. Thus, final models were adjusted for age only. In sites found to interact with race or ER status, we additionally presented stratified results.

Sensitivity analysis

We conducted post hoc sensitivity analyses, excluding outliers in interaction models of race and ER status which appeared to have been driven by one or two patients. In our race interaction models, we additionally restricted to women with BMI > 45 since NHW are not represented in this range. In the survival analysis, we examined models adjusting for (1) age and BMI, and (2) age, ER status, and stage. Finally, due to the potential for neoadjuvant therapy to affect the breast tumor epigenome, and known associations with breast cancer prognosis, we performed additional sensitivity analyses excluding women who underwent neoadjuvant systemic therapy.

Results

Demographic information is included in Table 1. NHB women in our study sample were older (mean age = 58 and 50 years, respectively) and had higher BMI than NHW women (BMI = 34.64 and 29.84 kg/m2, respectively, Supplemental Fig. 1). There were no differences by ER status, all-cause mortality, alcohol intake, and chemotherapy status between NHB and NHW women. When examining the participants mean methylation values collectively, we observed no differences by race, obesity status or breast cancer subtype, adjusting for covariates. However, overall DNA methylation was lower by 0.018 units (95% CI [− 0.032, − 0.005]) among women with 10-year mortality following breast cancer diagnosis compared to surviving patients adjusting for age, BMI, race, and smoking status.

Table 1 Counts and means (standard deviation) are presented for categorical and continuous variables, respectively

For the primary analysis assessing the epigenome-wide association of DNA methylation with BMI, no sites passed the FDR threshold for significance. Given the exploratory nature of the analysis and the limited sample, we determined a priori to further investigate the top 20 CpG sites for interaction with race, ER status, and all-cause mortality. The top 20 sites are listed in Supplementary Table 1.

We first performed interaction analysis to investigate differences in the relationship between BMI and DNA methylation in NHB vs. NHW women. The relationship between BMI and DNA methylation varied by race at one site. In NHB women, as BMI increased by 1 kg/m2, methylation decreased by 0.003 at cg03731251 in the TOMM20 gene, whereas among NHW women methylation increased by 0.002 (p = 0.001; Fig. 1a). Following a sensitivity analysis excluding two extreme values, the relationship was no longer significant (p = 0.053; Supplemental Fig. 2a). An additional sensitivity analysis restricted to women with a BMI < 45 excluding 11 NHB women was no longer significant (p = 0.052; Supplemental Fig. 2b).

Fig. 1
figure 1

Scatter plot and regression line depicting β-values by BMI. a Examines the interaction by race for cg03731251, bd examine the interaction by ER subtype for cg20174711, cg08755040, and cg23718418, respectively

We performed similar tests of interaction to investigate the relationship between BMI and DNA methylation of the top 20 CpG sites by ER status. Three CpG sites, annotated to three genes: cg20174711 (PSMB1), cg08755040 (QSOX1), and cg23718418 (PHF1), were differentially associated by ER status. Among women who were ER-negative, as BMI increased by 1 kg/m2, methylation at cg20174711 in the PSMB1 gene decreased at 1.4 times the rate of methylation decrease compared to ER-positive patients (p = 0.00004; Fig. 1b). After a sensitivity analysis excluding one site, the interaction was no longer significant (p = 0.40; Supplemental Fig. 2c). Similarly, at the site cg08755040 in the QSOX1 gene with each incremental increase in BMI, methylation among women with ER-positive cancer increased at two times the rate of ER-negative cancer (p < 0.0001; Fig. 1c). Sensitivity analyses, removing a single outlier, remained significant (p = 0.0006, Supplemental Fig. 2d). Finally, at site cg23718418 (PHF1 gene), among women with ER-negative disease, methylation increased by 0.002 for every 1 kg/m2 increase in BMI, whereas the increase was less than 0.0005 for every 1 kg/m2 increase in BMI among women with ER-positive breast cancer (p = 0.002; Fig. 1d). After exclusion of one site, the interaction was no longer significant (p = 0.51; Supplemental Fig. 2e). We additionally examined the residuals of the four sites described above in the EWAS models to assess violations of a non-normal distribution (Supplemental Fig. 3). These plots appear to be consistent with a normal distribution.

We identified three sites that were associated with poor prognosis (Table 2). In the TOMM20 gene, every 1% increase in methylation in cg03731251 was associated with a 6% reduction in all-cause mortality (HR = 0.94; 95% CI [0.91, 0.99]). Our estimate was robust to subsequent adjustment by BMI (HR = 0.93; 95% CI [0.88, 0.99]), as well as ER status and stage (HR = 0.93; 95% CI [0.89, 0.97]). Upon removing women who had undergone neoadjuvant therapy (n = 22), the association remained significant (HR = 0.91; 95% CI [0.87, 0.95]). Given the interaction identified between cg03731251 and race, we additionally examined the stratified hazard ratios for each model in NHW and NHB women. The effects seem to be primarily driven by NHB women (NHW HR [95% CI] 1.98 [0.97, 4.04]; NHB HR [95% CI] 0.46 [0.22, 0.94]). In the PSMB1 gene, we also observed an inverse association between methylation at the cg20174711 site and all-cause mortality, where an 9% reduction was observed for every 1% increase in methylation (HR = 0.91; 95% CI [0.83, 0.99]). Our estimate remained significant when adjusting for BMI (HR = 0.89; 95% CI [0.79, 1.00]), ER status and stage (HR = 0.91; 95% CI [0.84, 0.99]). When excluding women who had undergone neoadjuvant therapy, the association similarly strengthened (HR = 0.84; 95% CI [0.76, 0.92]). Given the observed interaction with ER status, we additionally examined the stratified models between ER-positive and ER-negative tumor types. While our multivariable models did not reach the threshold of statistical significance, we observed a modest risk reduction among women with ER-positive tumors (ER-positive HR [95% CI] 0.92 [0.84, 0.98]). The QSOX1 gene was not associated with poor prognosis overall. However, among women who were ER-negative, every 1% increase in methylation was associated with a 71% reduction in all-cause mortality adjusting for age (ER-negative HR [95% CI] 0.29 [0.11, 0.77]), although our estimate was imprecise [26, 27].

Table 2 Hazard ratios models of the top 20 obesity-associated CpG sites adjusted for (1) age, (2) age and BMI, (3) age, ER status and clinical stage, and (4) restricted to neoadjuvant therapy

Discussion

This study is the first untargeted analysis to examine BMI-associated methylation in breast cancer tumor tissue using the EPIC array. While a greater number of women is required to detect robust associations between tumor DNA methylation and obesity, we found unique interactions between the top BMI-associated methylation sites and race and ER status. One site in the TOMM20 displayed differential associations with race. Three sites in PSMB1, QSOX1, and PHF1 genes showed differential associations by ER status. Finally, two sites near the TOMM20 and PSMB1 genes, respectively, were associated with mortality.

Obesity and differential DNA methylation have previously been examined in 935 CpG sites of cancer-associated genes in ER-positive breast tumor tissue. Hair et al. found that BMI was associated with differential methylation in 30 CpG sites primarily related to immune response and insulin-like growth factor [28]. While none of the top 20 sites identified in our study replicated from Hair et al., this is likely due to differences in arrays used (EPIC and Illumina GoldenGate Cancer I Panel, respectively). Hair et al. noted that only 7 probes of the 30 identified in their analysis matched directly with probes in the Illumina HumanMethylation 450 Beadchip panel; suggesting limited overlap with the EPIC array. Importantly, this study did not examine heterogeneity by race or ER status and did not assess downstream impacts on breast cancer outcomes. To our knowledge, only one previous investigation, limited to 13 genes, has considered interactions between obesity and methylation on mortality [29].

In our study, an interaction between BMI-associated methylation and race was found in the TOMM20 gene. TOMM20 is a protein coding gene which encodes for Tom20, a protein essential to the recognition and translocation of mitochondrial preproteins [30, 31]. Tom20 is highly expressed in breast cancer tissue [32] and may bind with Aryl-hydrocarbon receptor-interacting protein (AIP) to mediate uptake of Survivin, an anti-apoptotic protein [33]. In our study, NHB exhibited an inverse association between BMI and methylation in cg03731251 of the TOMM20 gene, whereas NHW women exhibited a positive association between BMI and methylation. While no longer significant after removal of outliers, we found that hypomethylation at this site also associated with poor prognosis, but only among NHB women. Given the potential anti-apoptotic role of TOMM20, these data could suggest a pathway through which obesity may differentially impact prognosis by race.

PSMB1 was found to interact with ER status and was associated with all-cause mortality. PSMB1 is a member of the Proteasome β subunit (PSMB) family and has been shown to play a role in promoting breast cancer cell growth and migration [34]. Previous gene expression analyses have found differential expression of the PSMB1 gene between ER-positive versus ER-negative tumor tissue. Graham et al. found a 2.33-fold increased expression of PSMB1 in ER-positive compared to ER-negative tumor tissue [35]. In our study, as BMI increased, methylation decreased in both groups—albeit more pronounced in ER-negative tumors. Hypomethylation in this CpG site also associated with poor prognosis. Taken together, our findings suggest that CpG methylation in PSMB1 could play an important role in breast cancer progression and prognosis, given that individuals with decreased methylation had the more aggressive form of breast cancer (ER-negative) and poorer outcomes.

One site in QSOX1 was associated with ER status. Quiescin Sulfhydryl Oxidase 1 is an enzyme which catalyzes disulfide bond formation during protein folding and may play a role in growth regulation [36]. QSOX1 expression has previously been examined related to breast cancer with divergent findings. Pernodet et al. found that higher expression of QSOX1 was associated with reduced tumorigenesis and better outcomes [37]. Whereas Katchman et al. found higher expression of QSOX1 was associated with ER-positive tumors, higher tumor grade and poorer survival in ER-positive tumors. However, expression was not associated with survival in ER-negative, HER2 or TNBC tumors [38]. Similarly, Soloviev et al. found elevated QSOX1 transcription primarily in higher grade tumors, with 67% of grade 3 tumors exceeding the normal range of transcription [39]. Knutsvik et al. also found higher expression of QSOX1 was associated with poorer prognosis including high tumor grade, hormone receptor negativity, HER2 positivity, and increased tumor proliferation [40]. cg08755040 is located in the body of QSOX1. Methylation in the gene body has primarily been cited as signs of an active gene [41]. In our study, as BMI increased we observed similar increases in methylation among women with ER-positive tumors, which may suggest a positive association with expression. Given previous studies suggesting a deleterious effect of QSOX1 when expressed, particularly in ER-positive tumors, our results shed light on a specific pathway through which increased BMI may be influencing outcomes in ER-positive tumor types.

One CpG site in the PHF1 gene also interacted with ER status. PHF1 has been previously identified as an important regulator of histone methylation and an activator of the tumor suppressor p53 pathway. Moreover, it has been shown to be down-regulated in breast cancer tissue [42]. We observed an interaction with ER status, with greater BMI-associated methylation in ER-negative tumors compared to ER-positive tumors. Given that cg23718418 is located at the transcription start site, an increase in methylation may suggest decreased expression. While this may represent a valid mechanism underpinning the poorer prognosis associated with ER-negative tumors, we observed no downstream associations with all-cause mortality.

Our study has several limitations. With the limited sample size, we were underpowered to detect a main effect between CpG site methylation and BMI after correcting for multiple comparisons. Additionally, we did not correct for multiple testing in our interaction and hazard analyses, so it should be noted that these are suggestive relationships. We also found that some interaction analyses were no longer significant after exclusion of just one or two samples. For our analyses examining differences by race, we had fewer NHW women (with a lower distribution of BMI) limiting our ability to make strong inferences about interactions between BMI and DNA methylation by race. There are also several known risk factors for breast cancer that were not accounted for in our primary analysis including breastfeeding, nulliparity, and hormone therapy use. However, it is unresolved whether these covariates associate with breast tumor methylation, thus age-adjusted models are likely appropriate.

With a limited sample of patients, many of our results did not hold following exclusion of one or two individuals. While these results should be interpreted with caution, these preliminary data shed light on plausible epigenetic drivers of the association between obesity and breast cancer prognosis. The primary strength of this study was our untargeted approach to examine epigenetic pathways associating BMI with breast cancer prognosis in a diverse population of women undergoing surgery for breast cancer. Additional research efforts with a larger, equally diverse, patient pool are needed to validate our preliminary findings and further interrogate the biologic mechanisms identified here.

Data availability

The datasets generated during and/or analyzed during the current study are not publicly available due to IRB protocol but are available from the corresponding author on reasonable request to study PI (lauren.mccullough@emory.edu).