Introduction

Obesity is a well-established risk factor for chronic diseases including cancer, metabolic disease and Type 2 diabetes. High body mass index (BMI) has some degree of heritability, but is also influenced by lifestyle and environment.1, 2, 3, 4 High BMI and excess adiposity likely contribute to disease risk through a variety of biological pathways, which may include epigenetic processes such as alterations in DNA methylation patterns.5, 6

There is a growing body of evidence supporting links between DNA methylation and obesity. A recently published study of adipose tissue collected from a small number of monozygotic twins found evidence of differential methylation between lean and overweight/obese twins at multiple genes.7 Other studies have identified differentially methylated CpGs in adipose tissue between lean and obese individuals, including genes that were also differentially expressed in the adipose tissue.8, 9, 10 Because adipose tissue is difficult to obtain in large population studies, the few existing large epigenome-wide association studies have used blood DNA to identify potential methylation differences related to BMI. These studies have identified more than 40 CpG sites showing an association between methylation levels and BMI in mixed cohorts of men and women;10, 11 obesity-related differences in DNA methylation may help identify key biological pathways involved in disease pathogenesis. Using a nationwide cohort study, we have previously reported the association between obesity and risk of breast cancer.12 Here, we extend that work using a subset of the subjects to examine the association between BMI and blood DNA methylation.

Materials and methods

Study population

Study participants were drawn from the Sister Study cohort, a prospective volunteer cohort of women from the United States and Puerto Rico. From 2003 to 2009, women were recruited into the Sister Study if they had a sister with breast cancer but had not been diagnosed with breast cancer themselves. They complete annual, biennial and triennial updates on cancer and various other lifestyle, exposure and health factors. Participants have a home visit from a trained examiner at study baseline which includes blood collection and measurements of height and weight. The study was approved by the Institutional Review Board of the National Institute of Environmental Health Sciences, NIH and the Copernicus Group Institutional Review Board. Inclusion in this analysis is covered by the informed consent form all included cohort participants agreed to and signed. Because of ethical restrictions (participant confidentiality), data are available upon request. Permission to access Sister Study data may be obtained at www.sisterstudystars.org or by contacting the authors.

DNA methylation array analysis

We examined the relationship between methylation and BMI using two DNA methylation datasets available within the Sister Study. The ‘discovery set’ was comprised of 871 white, non-Hispanic women with methylation array data available on 27 589 CpG (cytosine-phosphate-guanine) sites from a nested case–cohort study of breast cancer. A second, smaller ‘replication set’ was comprised of 187 white, non-Hispanic women with methylation array data on 485 512 CpG sites from a nested case-control study of diethylstilbestrol (DES) exposure. Study populations and details of methods for methylation measurements have been previously described.12, 13, 14

Statistical analysis

To examine the association between BMI and DNA methylation in the discovery and replication sets, we used robust linear regression modeling. To correct for multiple testing the FDR was set at q<0.05 as previously described;12, 14 specific CpGs were examined in the replication set for the same direction of association and an unadjusted P<0.05. Data pre-processing, normalization methods and quality control measures are detailed in Supplementary Material of the previously published papers.12, 14 Briefly, methylation intensity values were background-corrected using the Robust Multichip Average method15 and quantile-normalized across arrays. Methylation array plates included laboratory controls with known methylation levels to assess precision of measurement, and duplicate samples to assess reproducibility of results within the assay. Each methylation array included probes to assess bisulfite conversion efficiency and negative control probes to measure background fluorescent intensity. Samples with poor bisulfite conversion efficiency (<3800) or having >5% of probes with unreliable measures (detection P>0.05) were excluded (N=3 probes in the 27k discovery set and N=491 probes in the 450k replication set). The methylation outcome was estimated using beta values, which are calculated using fluorescence intensities for unmethylated (U) and methylated (M) alleles as M/(M+U+100). Beta values range from 0 (completely unmethylated) to 1 (100% methylated). In the 450 K discovery set, we also excluded CpG probes with single-nucleotide polymorphisms present at target sites or that mapped to multiple genomic regions (N=48 158), Illumina-designed single-nucleotide polymorphism probes (N=65), and CpG sites on the X and Y chromosomes (N=10 257) (Illumina, San Diego, CA, USA). In the 27 K discovery data set we examined 27 575 probes and in the 450 K replication data set we examined 426 606 probes. Singular value decomposition analysis of the raw data set revealed that the top principal components derived from the methylation β-value matrix were highly correlated with plate, bisulfite conversion intensities and age. We adjusted for these factors together with breast cancer status in all linear regression association analyses. All association tests were also adjusted for the proportions of different types of white blood cells estimated using a method described by Houseman et al.16, 17

Pyrosequencing validation

Although the sample size in the replication set was much smaller (N=187), the 450 K array used for this set provides methylation data at ~320 000 additional CpG sites not covered by the 27 K array used in the discovery set. We selected five of the top BMI-associated CpGs identified from 450 K analysis of the replication set for independent validation by pyrosequencing in samples from the discovery set. Pyrosequencing assays for five CpGs cg17501210 (RPS6KA2), cg06500161 (ABCG1), cg07728579 (FSD2), cg11775828 (STK39) and cg13134297 (CRHR2) were designed using Pyromark Assay Design version 2.0.2.15 (Qiagen, Valencia, CA, USA). Primer sequences are detailed in Supplementary Table 1. Reaction mixtures (25 μl) containing 100 ng of bisulfite-converted DNA, 5 pmol of each primer (forward and reverse) PCR buffer (Invitrogen, Carlsbad, CA, USA), 3 mm MgCl2, 1 mm dNTP and 0.8 units of taq polymerase (Invitrogen), were heated to 95 °C for 15 min, followed by 45 PCR cycles (95 °C for 20 s, 55 °C for 20 s and 72 °C for 20 s) with a final extension at 72 °C for 5 min. After PCR, the biotin-labeled PCR product was hybridized to streptavidin-coated sepharose beads (GE Healthcare, Madison, WI, USA) and denatured in 0.2 m sodium hydroxide to provide a single-stranded sequencing template. Pyrosequencing primers (0.3 μmol l−1) were annealed to the single-stranded template and the pyrosequencing was carried out using PyroMark Q96 MD System (Qiagen) according to the manufacturer’s instructions. The percentage methylation was quantified using the Pyro Q-CpG Software (Qiagen). Associations between methylation levels and BMI were examined using linear regression analysis of the methylation percentages and BMI with adjustment for age and breast cancer status.

Results

BMI in the Sister Study DNA methylation datasets

Basic characteristics of Sister Study participants with methylation array data are summarized in Table 1. As the discovery set includes a substantial number of women who later developed breast cancer, characteristics are stratified by breast cancer case status. The mean BMI in both the discovery subset (27.1 kg m−2, s.d. 5.7 kg m−12) and replication subset (26.7 kg m2, s.d. 5.6 kg m−2) was slightly above the CDC’s ‘healthy BMI’ upper boundary. In both the discovery and replication set, at least 50% of participants fell into the overweight (BMI 25.0–29.9) or obese (BMI 30+) categories. The replication set had a larger proportion of women in the healthy BMI category (49%) than the discovery set (42%). Seven percent of the discovery set and 6% of the replication set reported they were current smokers, and slightly more women in the discovery set had at least one pregnancy (87%) compared with the replication set (81%).

Table 1 Characteristics of participants in the discovery set (N=871) and replication set (N=187) stratified by breast cancer case status

Analysis of BMI and DNA Methylation in discovery set

In an analysis limited to women in the discovery set who had not developed breast cancer (N=571), BMI was associated with methylation at two CpG sites located in the genes LGALS3BP (q-value=0.02) and ANGPT4 (q-value=0.05) (Table 2). When the analysis was expanded to include women who developed breast cancer during study follow-up with adjustment for case status (N=871), the same two sites remained associated (cg14870271 and cg03218374) and two additional associated CpG sites were identified in the genes RORC and SOCS3 (cg18149207 at q-value=0.02 and cg27637521 at q-value=0.02). All four of these CpG sites are present on the Infinium450K array used in the replication study.

Table 2 CpG sites with associations between BMI and methylation passing the false discovery rate (q<0.05) in the discovery set

Analysis of BMI and DNA methylation in replication set

The association between BMI and DNA methylation at these four CpGs was examined in the replication sample (N=187); P-values for all four associations passed a strict Bonferroni correction for multiple testing (P<0.0125) as shown in Table 2. The CpG sites located in LGALS3BP and RORC showed an ~0.1% increase in methylation with each increasing unit (1 kg m2) of BMI, and the site in ANGPT4 showed 0.2% increase in methylation per 1 unit of BMI. These differences would correspond to approximately 1% higher methylation at these sites between individuals at normal weight (BMI=20) verse obese individuals (BMI=30). The CpG located in SOCS3 showed an association in the opposite direction, with 0.05% decrease in methylation with each increasing unit of BMI.

A 450 K array-wide analysis of DNA methylation and BMI in the replication set women revealed associations (q<0.05) for 23 CpG sites (Table 3). Three of these 23 sites were covered on the 27 K array; one of the three, cg21282997 located in the gene IL18RAP, showed an association with BMI at p=0.02 in the discovery set. The remaining 20 CpG sites were not covered on the 27 K array; from these 20 we selected five of the top CpG sites for pyrosequencing analysis to determine methylation status in participants from the discovery set (Table 3).

Table 3 CpG sites where associations between methylation and body mass index pass the FDR on the 450 K methylation array (N=181)

Results of pyrosequencing validation

Pyrosequencing analysis of 871 DNA samples from the discovery set validated the associations between BMI and methylation at the five selected CpG sites (Table 4). Point estimates for the associations between BMI and methylation at all five CpGs tested (cg17501210; cg06500161; cg07728579; cg11775828; and cg13134297) generated using pyrosequencing passed a Bonferroni-corrected P-value threshold of 0.01 and both the magnitude and direction of effect estimates were similar between array and pyrosequencing data. Pyrosequencing also provided information on CpGs flanking the sites of interest for cg17501210 (RPS6KA2) and cg06500161 (ABCG1), and these flanking CpGs had findings similar to that of the target CpG (Table 4).

Table 4 Top CpG hits associated with BMI on the 450 K array (N=181) that were not included on 27 K array

Replication of findings from published studies of BMI and genome-wide DNA methylation

Recent epigenome-wide association studies have used the Infinium HumanMethylation450 BeadChip to investigate BMI and peripheral blood DNA methylation data. We examined whether differentially methylated CpGs reported in those studies were differentially methylated in our replication set data (Supplementary Table 2). A study from the Cardiogenics Consortium identified five CpGs associated with BMI in their discovery cohort of men and women and replicated three of these associations.18 Our data provide additional support for their observed associations in a women-only cohort with BMI at two CpG sites: one in KLF13 (cg07814318 at P⩽0.0003) and one in KIAA0664/CLUH (cg09664445 at P⩽0.04). A second study utilizing the Atherosclerosis Risk in Communities cohort and multiple replication cohorts identified 76 CpGs that were differentially associated with BMI and replicated 37 sites.10 Eighteen of these associations were recently replicated in an Arab population,19 and we found that 27 of these associations replicated in our data set, including the site cg06500161 in ABCG1 identified in our discovery set. We provide the first replication for associations with BMI at the CpG sites cg09554443 in CD247, cg04986899 in XYLT1, cg05242915 on chromosome 19 and cg03562528 in ASB2.

Discussion

Using 27 K, 450 K, and pyrosequencing data on 1066 women from the Sister Study cohort, we identified and replicated eight previously unreported associations between BMI and CpG site DNA methylation and confirm a previously reported and replicated association at cg06500161 in ABCG1. Our results also provide support for association at 27 previously reported additional CpG sites and provide the first replication for associations with BMI at four of these sites.

We identified and confirmed methylation differences associated with BMI at CpG sites in the genes LGALS3BP, ANGPT4, RORC, SOCS3, RSPK6A2, FSD2, ABCG1, STK39 and CRHR2. Several of the genes with differentially methylated sites in our study have been linked to obesity and obesity-related chronic diseases. The decreased methylation with increasing BMI observed at cg27637521 is of particular interest as it is 25 bp from the transcriptional start site of SOCS3 (suppressor of cytokine signaling 3). The two flanking CpGs in SOCS3 (cg10508317 and cg10279487) also show decreased methylation with increasing BMI (P⩽0.003). Animal models indicate that SOCS3 expression inhibits leptin signaling and is likely involved in the decreased leptin sensitivity observed in obese individuals.20, 21, 22, 23, 24 SOCS3 is a negative regulator of cytokine signaling; expression is induced by various cytokines, including IL-6, IL-10 and interferon (IFN)-γ.25, 26, 27 Genetic polymorphisms in and near SOCS3 are associated with obesity in human population studies.28, 29, 30 In cohort studies of Indian Asians and Europeans, decreased DNA methylation in blood at another CpG (cg18181703) in the coding region of SOCS3 has been linked to risk of Type 2 (T2) diabetes.31

We also identified a differentially methylated region in the gene CRHR2 (corticotrophin-releasing hormone receptor 2); the CpG site we identified (cg13134297) is in the first intron of this gene. As with SOCS3, examination of flanking CpGs on the 450 K array (cg23068772 and cg22007110) shows associations with BMI (P⩽0.002) with the same directionality. CRHR2 is involved in corticotrophin-releasing hormone signaling and likely has a role in coordinating endocrine and autonomic responses to stress.32 STK39 (Serine Threonine Kinase 39) also plays a role in cellular responses to stress; we identified and confirmed increasing methylation with increasing BMI at this gene at cg11775828. Flanking CpGs on the 450 K array (cg21899461 and cg03331300) were also associated with BMI at P<0.05. Polymorphisms in this gene have been linked to risk of hypertension.33, 34, 35 Epigenetic silencing of STK39 through DNA hypermethylation in B-cell lymphoma appears to promote cancer progression by inhibiting apoptosis in cells with DNA damage.36

The differentially methylated CpGs we identified in the genes RORC and FSD2 may also be promising targets for future studies of obesity and development of cardiovascular and metabolic diseases. RORC (RAR-Related Orphan Receptor C) regulates production of the inflammatory cytokine IL-17 by T helper-17 (Th17) cells; Th17 cell activation and cytokine production appears to play a critical role in the pathogenesis of diabetes.37, 38, 39, 40, 41 Additionally, genetic variation in RORC has been linked to body fat composition levels in cattle.42 Single-nucleotide polymorphisms in FSD2 (Fibronectin Type III and SPRY Domain Containing 2) have been linked to the presence and volume of carotid plaque in a Caribbean Hispanic population.43

We provided the first replication of associations with BMI at the CpGs cg09554443 in CD247, cg04986899 in XYLT1, cg05242915 on chromosome 19, and cg03562528 in ASB2. CD247 (T-cell surface glycoprotein CD3 zeta chain) plays an important role in antigen recognition in the immune response and shows altered expression patterns in peripheral blood in individuals with Type 2 Diabetes.44 XYLT1 (Xylosyltransferase 1) is necessary for biosynthesis of glycosaminoglycan chains. Genetic variations in this gene can increase risk of abdominal aortic aneurysm.45 ASB2 (Ankyrin repeat and SOCS box protein 2) is differentially methylated in atherosclerotic aorta tissue compared with healthy aortic tissue.46

The observed association between adult BMI and methylation at cg03218374 in the Angiopoetin-4 (ANGPT4) gene is consistent with an existing study that links methylation at this same CpG in cord blood to infant birth weight,47 indicating that this association between body size and methylation is consistent in both infants and adults. This CpG is located in the promoter region of ANGPT4, 18 bp upstream of the transcription start site. ANGPT4 plays in important role in vascular angiogenesis, and may help to mediate lipid breakdown.48 We have also confirmed the association between BMI and methylation at ABCG1 (cg06500161) observed in previous studies.10, 11 Methylation of multiple CpG sites in ABCG1 is linked to individual variability in blood lipid levels.49 In our pyrosequencing validation analysis of RPS6KA2 and ABCG1, our primers covered additional CpG sites (two sites in RPS6KA2 and 1 in ABCG1) that were not covered on the Infinium450K array. These additional CpG sites exhibited the same associations with BMI as the adjacent sites on the array (Table 4). Both sites in ABCG1, 1 bp apart, and all three sites in RPS6KA2 (spanning a 9 bp region) show increasing methylation with increasing BMI.

We have identified and validated associations between BMI and methylation at nine CpG sites using a combination of array-based and pyrosequencing methylation measures, and provided further confirmation for an additional 27 associations reported in previous studies. A potential limitation of our study is the inclusion of DES-exposed women in our replication set, which might potentially affect our results if DES also affects methylation patterns. However, previous analysis of these participants’ DES exposure status and the DNA methylation array data showed no detectable associations between DES exposure and DNA methylation.14 Although this study’s cross-sectional design limits our ability to determine whether these methylation differences are caused by obesity, it has the advantage of large number of participants with both DNA methylation data and examiner-measured height and weight.