Introduction

Dietary Agents Targeting the Epigenome

Within the past decade, epigenetic mechanisms and their modulation by dietary agents have gained major interest in the cancer prevention and nutrition community. Numerous in vitro and selected in vivo analyses indicate that aberrant epigenetic programming during disease progression may be prevented and even reversed by dietary agents. Phytochemicals from various dietary sources, including green tea, soy, fruit, and berries such as black raspberries, cruciferous vegetables, turmeric, onions, cashew nuts, and others, were shown to directly target enzymatic activities or modulate expression of enzymes involved in epigenetic gene regulation, including DNA methyltransferases (DNMTs) and histone-modifying enzymes such as histone acetyltransferases, deacetylases, methyltransferases, and demethylases that modulate chromatin accessibility. Many phytochemicals were also shown to alter expression of non-coding (micro) RNAs in cell culture, adding to their potential to epigenetically regulate gene expression. Research is accumulating that these activities might contribute to chemopreventive efficacy by affecting signal transduction cascades mediated by nuclear receptors and transcription factors such as NF-κB, cell proliferation and cell cycle progression, cellular differentiation, DNA repair, apoptosis induction, cell motility, metastasis formation, and cellular senescence (reviewed in [111]). If true, these agents could be of significant value in cancer prevention. So far, evidence for in vivo epigenetic activities in animal models and human pilot studies and its relevance for chemopreventive efficacy is limited. Careful documentation of how these agents impact epigenetic programming in tissues is critical for understanding their impact on the epigenome in cancer prevention.

DNA Methylation

DNA methylation is one of the best investigated mechanisms of epigenetic gene regulation [12, 13]. The transfer of methyl groups to DNA is catalyzed by the DNMT family of enzymes. In mammals, DNA methylation mainly occurs at the 5-position of cytosine (C) in the context of CpG dinucleotides, generating 5-methylcytosine (5mC). The current human genome build contains more than 28 M CpGs dinucleotides. Interestingly, CpG sites are not evenly distributed in the genome: there is an accumulation of CpGs (CpG-dense regions, so-called CpG islands or CGIs) in promoter regions of genes, whereas intra and intergenic regions are characterized by a lower density of CpGs. In healthy tissue, promoter CGIs are normally unmethylated, allowing active gene transcription, whereas non-promoter CpGs are highly methylated, thus limiting DNA accessibility and contributing to genomic stability [14]. As tissues age, an increasing number of genetic loci become silenced by DNA methylation, a process that is likely to be exacerbated by poor health and nutrition. Gene silencing often continues to expand through the carcinogenic pathway, with a range of critical growth-regulatory and tumor suppressor genes targeted in cancers. Global loss of methylation (hypomethylation), especially at repetitive sequences, and hypermethylation of CGIs in promoter regions are among the most important epigenetic changes to occur in cancer cells and thought to be involved in the etiology of cancer. In contrast to the irreversible inactivation of tumor suppressor genes by genetic alterations, genes silenced by epigenetic modifications are still intact and can be reactivated [14].

So far, most of the studies investigating the influence of dietary agents on aberrant DNA methylation have been performed in (cancer) cell culture and focused on only few selected candidate genes. Consequently, at present, it is largely unknown whether promoter demethylation and reactivation of genes silenced by DNA methylation is a random effect accompanying unspecific inhibition or reduced expression of DNMTs or whether there are targeted mechanisms underlying these activities.

Genome-Wide Methylation Profiling

With the advancement and increased affordability of array- and next-generation sequencing (NGS)-based technologies, we now have tools at hand for epigenetic analyses at a genome-wide level. Methodologies for genome-wide methylation profiling are either enrichment-based by affinity of methylated DNA to methyl-binding proteins or antibodies against 5mC or rely on the quantitative determination of DNA methylation levels after bisulfite-treatment of DNA (overview in Table 1). A comprehensive description of the techniques is beyond the scope of this perspective. Readers are referred to recent articles and reviews that have benchmarked the methods in detail [1518, 19••, 2022]. A good comparison of principles and limitations/sources for bias is given in ref. [23••].

Table 1 Overview of technologies for genome-wide methylome analyses

Both enrichment approaches have been coupled with DNA microarray hybridization (Chip) and/or massively parallel sequencing (Seq) for identification of thousands of genomic regions differentially methylated between tumor and normal tissues at a genome-wide scale.

Changes in DNA methylation after short- or intermediate-term dietary intervention are expected to be small. Therefore, selection of methodology for the unbiased detection of small genome-wide DNA methylation changes in nutritional studies remains a challenge. We will here describe our practical experiences and give examples of ongoing studies.

Affinity Enrichment-Based Methods

One of the strategies to reduce complexity in whole-genome methylation analyses is to enrich for highly methylated regions. Differentially methylated regions (DMRs) are then identified by comparison of two distinct samples (for example, tumor vs. normal). Enrichment is achieved either by methylated DNA immunoprecipitation (MeDIP) with monoclonal antibodies against 5mC [24] or by methods based on affinity capture of methylated CpGs with family members of the methyl-CpG binding domain (MBD)-based proteins (collectively termed as MBDCap) [25]. Several MBDCap methods have been developed that differ in the MBD protein used for enrichment. The MethylCap (methylated DNA capture) assay uses the MBD domain of MeCP2 [26], MCIp (methylated CpG immunoprecipitation) [27], and MiGS (MBD-isolated genome sequencing) [28] employ MBD2 protein, and MIRA (methylated-CpG island recovery assay) uses a complex of MBD-based proteins MBD2 and MBD3L1 with enhanced affinity for methylated CpGs compared to MBD2 alone [29, 30].

There are differences between MeDIP and MBDCap methods that influence the obtained results (see ref. [21]): the anti-5mC antibody used for MeDIP captures DNA fragments containing one or more methylated cytosines that are then eluted in one fraction. In contrast, MBD proteins bind with increasing affinity to multiple methylated CpG dinucleotides in close proximity. One can take advantage of this fact by serial elution of methylated DNA fragments with increasing salt concentrations. Downstream analyses of multiple fractions provide an overview of methylation changes at regions with increasing CpG density as exemplified in ref. [26], at the expense of higher costs. Alternatively, one can focus on the highest affinity fraction eluted with a high salt (HS) concentration to enrich for CGIs, or perform “single-fraction (SF) elution” without fractionation according to CpG density, as described previously [21, 25, 31]. In general, MeDIP was found to preferentially enrich low to intermediated CpG dense regions, whereas MBDCap methods, depending on the elution protocol, could be biased for CpG-dense regions [22, 25].

MCIp-Chip Analysis of Human Breast Cancer

Both MeDIP and MBDCap methods have been used in combination with promoter, CGI or tiling arrays, or NGS. Array-based detection has the advantage of relatively straightforward bioinformatical analysis, but is biased by the selection of genomic regions covered by the array. We used MCIp-based enrichment by HS elution of CpG-dense, highly methylated DNA fragments from breast cancer and normal breast tissue in combination with comparative hybridization to CGI arrays to discover a series of hypermethylated genes as novel potential biomarkers of low-grade breast cancer [32]. Identification of regions with most consisted gain in methylation was achieved by generating histograms of probes that met certain cutoff criteria (details in [32]) (Fig. 1a). Significant hypermethylation in tumor tissue was confirmed and validated in independent sample sets by quantitative mass spectrometry-based EpiTyper MassArray technology [33], with methlylation differences of 20 to >60 % between tumor and normal tissues.

Fig. 1
figure 1

Examples of affinity enrichment-based genome-wide methylation analyses (MCIp-Chip, MCIp-Seq). a MCIp-Chip analysis of human breast cancer, with KCTD8 as an example (from ref. [32]). Highly methylated DNA fragments from 10 human breast cancer and 10 normal breast tissues were enriched by MCIp with high salt (HS) elution. Unmatched pairs of tumor and normal samples were co-hybridized to Agilent CpG island (CGI) arrays for the detection of differentially methylated regions (DMRs). The genomic position (in blue/black), location of the corresponding CGI (in green), and probes covered on the array (in grey) are depicted. The number of arrays for which a given probe was positive (from 0 to 10) is represented by the height of the corresponding red bar. For comparison, WGBS data tracks for HCC1954 breast cancer and HMEC human mammary epithelial cells (in yellow) were derived from the UCSC genome browser [83, 84] DNA methylation track hubs [85, 86]. Each vertical line represents one CpG site; the height indicates the level of methylation from 0 to 100 %. The CGI located at the KCTD8 promoter is highly methylated in HCC1954, but unmethylated in HMEC, and dependent on the probe, was positive in up to 10/10 arrays. b Genomic distribution of hyper- and hypomethylated DMRs identified by MCIp-Seq in tumors derived from the C3(1) transgenic mouse model in comparison to mammary glands of wild-type controls. More than 90 % of the DMRs are located outside of core promoter regions (including 5′UTR, promoters, and CGIs). c Kinetics of hypomethylation at a CGI (location in green) overlapping an exon of the Daxx gene (in blue) in the C3(1) mouse model. Highly methylated DNA fragments were enriched with HS elution and subjected to next-generation sequencing (NGS). Read counts (range 0–100 reads, normalized to 10 M reads) at specific genomic positions are depicted for groups of transgenic (TG, red) and wild-type (WT, blue) animals covering an age range of 4–24 weeks. Noteworthy is the excellent visual uniformity of MCIp-enriched DNA fragments in all groups. Comparison with WBGS tracks for murine embryonic stem cells (ESC), placenta and uterus (in yellow) indicates that MCIp-enrichment with HS elution is limited to fragments with high CpG density (narrow lines in WGBS tracks) and high methylation. d Confirmation by quantitative EpiTyper MassArray analyses of consistent high methylation in WT mice vs. gradual loss of methylation in TG animals at the intragenic/exonic Daxx CGI. Median and range in each group are indicated by black lines. e MCIp-Seq analyses to identify methylation changes in mammary glands of ovariectomized Wistar rats exposed to a soy isoflavone-enriched diet (IRD) vs. isoflavone-depleted diet (IDD) for 17 days after ovariectomy. The bottom track shows the percentage of G (guanine) + C (cytosine) bases in 5-base windows (in black, range 30–70 % with a horizontal line at 50 %). As in Fig. 1c, HS elutes CpG dense DNA fragments (upper two green tracks, range 0–30 reads, normalized to 10 M reads). Note that the CGI in the promoter region of SFRP1 (secreted frizzled related protein 1) is not methylated in normal mammary glands and therefore not enriched. For comparison, we performed single fraction (SF) elution to enrich DNA fragments with intermediate-high CpG density (lower two red tracks, range 0–10 reads normalized to 10 M reads), as seen by the increased number of intra- and intergenic peaks that are not enriched by HS elution. By SF elution, more CpGs are covered at least once, but to reach saturation comparable to HS elution, an overall higher number of reads per sample is required

In general, detection of differential methylation by MBDCap methods is biased by copy number alterations [21]. For example, in regions affected by loss of heterozygosity, only one allele is enriched, and the region is then detected as false-positive hypomethylated DMR. Often, we could not validate such “loss of methylation” by EpiTyper MassArray, which does not discriminate between alleles. On the other hand, methylation levels of hypermethylated DMRs were >80 % concordant between both technologies. Consequently, we limited our analyses to hypermethylated regions.

MCIp-Seq in the C3(1) Transgenic Murine Breast Cancer Model

We currently employ robot-assisted MCIp-enrichment by HS elution of highly methylated CpG-dense regions coupled with NGS (Heilmann et al., in preparation) to characterize changes in DNA methylation during the progression of breast cancer in the transgenic C3(1)SV40TAg (C3(1)) mouse strain [34]. For the analysis of methylation kinetics, mammary glands and tumors of transgenic C3(1) mice (TG) were collected at 4-week intervals over a period of 24 weeks (three animals per age group). Mammary glands of age-matched wild-type (WT) littermates that do not show tumor formation were analyzed for comparison. Quality-controlled raw sequencing reads were aligned to the mouse reference genome using Burrow-Wheeler-Alignment ([35], summary of computational methods and software tools for methylation analyses in ref. [36••]). Depending on the degree of enrichment, NGS will identify different numbers of fragments covering a specific region (reads) in a sample. In contrast to the readout by CGI arrays, NGS allows genome-wide detection of enriched fragments, and we achieved an overall coverage of about 3.4–6.7 M CpGs. We used the HOMER tool (hypergeometric optimization of motif enrichment [37]) to calculate differences in read frequencies between TG and WT animals per age group, taking defined parameters such as fold change (FC), p value, and false discovery rate (FDR) into consideration. As an example, at the age of 24 weeks, we detected around 9.000 hyper and 9.500 hypomethylated DMRs based on a fourfold difference in read counts between TG and WT. Notable, more than 90 % of all DMRs (Fig. 1b) were located outside of core promoter regions that contribute to gene regulation (including 5′UTR, promoters, and CGIs). Serre et al. reported similar findings for MiGS analyses [28]. For validation, we selected, among others, a DMR overlapping with an exonic CGI of Daxx (death-domain-associated protein), a histone chaperone that facilitates chromatin assembly [38]. MCIp-Seq revealed a gradual loss of methylation in TG mice over time (Fig. 1c), which was associated with significant upregulation of gene expression (data from [39]). EpiTyper MassArray analyses confirmed stable methylation at around 80 % in WT mice and significant hypomethylation in TG animals from 80 % at 4 weeks to around 13 % median methylation at 24 weeks (Fig. 1d). Since we selected DMRs with gradual changes in methylation over time and integrated data on methylation changes with changes in gene expression, different from the human study, in the C3(1) study, both hyper and hypomethylated DMRs could be validated by EpiTyper MassArray analyses with >90 % concordance.

As mentioned above, the majority of DMRs were located outside of core promoter regions. We are currently just beginning to understand the role of DMRs in intronic, exonic, and intergenic regions for general and cell-type-specific gene regulation. They have been postulated to overlap with enhancers or transcription factor binding sites [40, 41]. For dietary intervention studies in healthy volunteers, methylation changes at these regions might be of interest, as promoter CGIs are generally not methylated in normal tissues and hence, only gain in methylation would be detectable.

MCIp-Seq After Dietary Soy Intervention in Healthy Wistar Rats

MCIp-Seq is used in an ongoing study on methylation changes in mammary glands of Wistar rats as part of the German Research Foundation (DGF)-funded IsoCross project, in cooperation with partners from the German Sports University in Cologne (Pudenz et al., in preparation). One of the aims in this project is to investigate the influence soy isoflavones on estrogen sensitivity of mammary glands [42]. Isoflavones are phyto-estrogenic plant compounds and could therefore attenuate physiological changes associated with hormone deprivation [43]. A group of female rats was subjected to ovariectomy at the age of 80 days. During the period of hormonal decline after ovariectomy, rats received a diet enriched with a soy extract equivalent of about 400 ppm isoflavones (isoflavone-rich diet, IRD) or an isoflavone-depleted diet (IDD) for comparison. The study was terminated at day 97, and DNA from mammary glands was processed by MCIp-Seq as described for the C3(1) study. We first focused on the analyses of CpG-dense fragments with high DNA methylation obtained by HS elution. Bioinformatic comparison of both groups revealed that about 3100 regions were hypo and 2000 regions hypermethylated in the IRD vs. the IDD group, with fourfold difference in read counts between both groups, and at least 10 reads per region in one group as selection criteria (Fig. 1e, upper two lanes). Again, more than 90 % of the DMRs were located outside of core promoter regions.

Different from the C3(1) study, the identified DMRs were difficult to confirm by quantitative EpiTyper MassArray analyses. We often obtained non-significant median methylation differences below 10 % between IRD- and IDD-treated groups and observed high inter-individual variation of animals within one group. The latter might be attributed to the fact that the Wistar rat strain is an outbred stain, and individual response to ovariectomy and isoflavone intervention might vary substantially. The discordance between MCIp enrichment and EpiTyper MassArray results might result from the fact that MCIp with HS elution will enrich CpG-dense, highly methylated DNA fragments, whereas CpG-dense regions with lower DNA methylation as well as highly methylated DNA with low CpG density are removed before sequencing (also, see ref. [31]). Therefore, by setting a high detection threshold, small absolute differences between groups will be amplified. Discordance might also result from the fact that we analyzed whole genomic DNA from a part of one mammary gland per animal. Mammary glands are composed of multiple cell types, for example, ductal epithelial cells, adipocytes, stromal cells, immunological cells, and others. A difference in MCIp-Seq read counts between groups might reflect enrichment of highly methylated DNA derived from a subpopulation of cells. In subsequent quantitative EpiTyper MassArray analyses of the bulk of DNA derived from all cells in one sample, methylation differences will be “diluted” by the contribution of all cell types to an average methylation level. Differences in read counts in MCIp-Seq experiments might also reflect differences in cell composition between samples rather than alterations in methylation levels. This bias can be avoided by using pre-selected cell populations (for example, by cell sorting based on surface markers using magnetic beads or flow cytometry or by laser capture microdissection of regions of interest).

It has been proposed that highly methylated CGIs might be more resistant to demethylation than regions with intermediate levels of methylation [44]. Therefore, we performed another MCIp-enrichment from the same samples by SF elution with a reduced salt concentration (also, see refs. [21, 31]) to include DNA fragments with intermediate and high CpG-densities and/or methylation levels (Fig. 1e, lower two lanes). Obviously, we obtained higher coverage of the genome compared to the HS elution, but the distribution between various genomic locations did not considerably change. However, it should be noted that substantially higher numbers of mapped reads are required to reach comparable saturation (Pearson correlation of two random partitions of the sequenced sample, indicative of reproducible coverage of the reference genome [45, 46]) as with HS elution. Therefore, we had to pool sequencing reads from several samples to reach sufficient coverage. Still, aligned read counts were lower than for HS elution (as reflected by the different y-axis scale for HS and SF elution in Fig. 1e). Keeping identical selection criteria (fourfold difference and at least 10 reads per region in one group) we detected about 300 hypo and 1000 hypermethylated DMRs between the IRD and the control group. Validation by EpiTyper MassArray is currently ongoing; therefore, we cannot comment yet whether this approach might be a recommended strategy for intervention studies.

In conclusion, to enhance the sensitivity of detecting small methylation differences at genome-wide level and to increase the coverage of genomic regions with low CpG-density and methylation, SF elution seems to be the preferred method. However, to reach sufficient saturation and coverage, a higher number of sequencing reads is required, increasing the overall costs of the analyses. In any case, enrichment-based methods provide only relative or indirect information on DNA methylation levels that have to be validated by independent quantitative methods such as EpiTyper MassArray or pyrosequencing [47].

Quantitative Bisulfite-Treatment-Based Methods

Sodium bisulfite (BS) conversion of genomic DNA is the gold standard for DNA methylation analysis to differentiate and detect unmethylated versus methylated cytosines [48]. BS treatment of single-stranded DNA results in the preferential chemical deamination of unmethylated cytosine residues to uracil, whereas the deamination of 5mC to thymine is very slow. In subsequent PCR reactions, all uracils (from unmethylated Cs) are amplified as thymines, whereas only 5mCs are amplified as cytosines, allowing discrimination of unmethylated and methylated Cs at single CpG resolution. Incomplete BS conversion results in false-positive detection of Cs as 5mCs. Since after BS treatment opposite DNA strands are no longer complementary, BS conversion-based methods permit strand-specific methylation analysis.

WGBS and RRBS

Genome-wide methods that utilize the advantages of BS treatment include conventional and tagmentation-based whole genome bisulfite sequencing (WGBS, T-WGBS) and reduced representation bisulfite sequencing (RRBS) [4951]. WGBS provides single nucleotide methylation information for about 95 % of all CpGs in a genome (examplary, WGBS reference tracks are depicted in Figs. 1 and 2). T-WGBS uses an alternative protocol for WGBS library preparation, based on the enzymatic activity of a transposase that simultaneously fragments the DNA and tags the fragments with adapters. This procedure makes intermediate cleanups between different library preparation steps largely unnecessary; thus, the amount of DNA input can be reduced to 10–30 ng [51].

Fig. 2
figure 2

Examples of bisulfite treatment-based genome-wide methylation analyses (WGBS, Illumina 450k, RRBS). Depicted is the genomic location of the AHRR gene (aryl hydrocarbon receptor repressor), which harbors a CpG site (cg23576855, indicated by red arrow) with methylation levels associated with smoking status [60]. UCSC genome browser panels show (from top to bottom): chromosomal location, graph scale, hypomethylated regions (HMR, marked with blue bars) in peripheral blood mononuclear cells (PBMC), PBMC WBGS information (in yellow) derived from the UCSG genome browser DNA methylation track, with each vertical line representing one CpG site and the height indicating the level of methylation from 0 to 100 %, location of CGIs (indicated in green), GC percentage (in black, scale 30–70 %, with a horizontal line at 50 %), AHRR gene locus in blue/black. The region of interest (marked by a red box) is enlarged below in order to visualize details with higher resolution. The lower panel depicts (from top to bottom): the enlarged intragenic region of the AHRR locus with scale, chromosomal region, AHRR gene locus in blue/black, location of CGIs (in green), followed by 12 tracks (in green) of methylation data from 450k analysis of PBMC DNA derived from three young healthy smokers (P1, P2, P3) participating in a pilot intervention study over a period of 6 weeks. Time points (T1–T4) indicate: T1 before and T2 after placebo intervention (10 days), 20 days washout period, T3 before and T4 after broccoli intervention (for 10 days). Each vertical line represents the location of one CpG site covered on the 450k array; the height indicates methylation levels from 0 to 100 %. WGBS information for PBMC DNA (in yellow) is followed by two tracks for the human embryonic stem cell line H9. H9 RRBS (in red) indicates quantitative methylation information (from 0 to 100 %) derived from RRBS analyses (each vertical line represents one CpG site), with H9 WGBS information (in blue) for comparison. The bottom track gives an overview of the G + C content (range 30–70 %, with a horizontal line at 50 %). Note that (i) 450k analyses recapitulate WBGS information at reduced coverage, (ii) probes overlapping with SNPs (such as the one indicated by the red arrow) should be excluded from genome-wide analyses across individuals, unless the influence of SNPs on methylation levels is part of the research question, (iii) 450k and RRBS provide comparable datasets

For both WGBS and T-WGBS, an extended number of reads (obtained from about three lanes on an Illumina HiSeq 2000 flow cell with paired-end sequencing of 100 bp) is required to sufficiently cover the entire genome. Therefore, these methods are extremely costly and due to the large amount of data generated demand extended bioinformatical expertise. In mammalian genomes, only a small fraction of all Cs is methylated (about 3–6 %). As a consequence, after BS treatment the genomic sequence is reduced mainly to three bases T, A, and G, hampering mapability of the obtained reads to the reference genome [15]. To overcome these complications, specific bioinformatic tools are being developed to process datasets from WGBS (overview in ref. [36••]). Different from array-based technologies, sequencing-based methods can be applied to any species as long as a reference genome is available.

RRBS uses the same NGS strategy as WGBS [19••]. The fraction of the genome to be sequenced is reduced by digesting genomic DNA with restriction endonucleases that are specific for CpG containing motifs, in combination with fragment size selection [52]. About 1–3 M CpGs are covered by RRBS, with enhanced coverage of regions with moderate to high CpG density, including CGIs, promoters, and enhancers. Since RRBS fragments the DNA at specific restriction sites, the analyzed fragments for a given species are relatively constant, thus increasing the utility for comparative DNA methylation profiling [19••, 53].

In cooperation with Christoph Bock (CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna), we have established a RRBS library preparation pipeline and subjected first samples derived from the IsoCross project for sequencing. Based on our experience with MCIp-Seq followed by EpiTyper MassArray analyses (“MCIp-Seq After Dietary Soy Intervention in Healthy Wistar Rats”), we expect methylation changes to be small. Validated software tools such as RnBeads [54] are available for RRBS data mining that present quantitative methylation data at single CpG site, CGI and promoter level, allow pair- or group-wise statistical comparison of intervention groups for efficient detection of DMRs, generate graphical reports of results, and provide an overview of enriched gene ontologies representing the DMRs [55]. Our expectations are that this methodology will provide more reliable genome-wide methylation profiles from dietary intervention studies, as it is not enrichment-based and provides a direct readout of quantitative methylation levels for biostatistical comparison of samples. Sample requirements in the range of 100 ng genomic DNA are lower than for MBDCap-Seq methods, thus facilitating analysis of preselected subpopulations of cells. When several samples are multiplexed for sequencing, costs for both technologies are comparable. Depending on the quality of sequencing libraries, 1–3 M CpG sites can be covered per sample. It should however be kept in mind that the analyzable CpG sites are not necessarily identical in all samples of one experiment (different from array analyses with defined sites) and consequently the number of overlapping CpG sites that meet certain quality criteria (e.g., coverage) might drop substantially.

Illumina 27k and 450k Beadchip Arrays

As an alternative to sequencing-based methods for samples from human intervention studies, Illumina Infinium Human Methylation 27k and 450k Bead Chip arrays provide quantitative methylation data of defined CpG sites at a genome-wide level.

First introduced in 2008, the Illumina 27k array platform covers 27,578 CpG sites located in CGI associated with 14,475 annotated genes. BS-treated DNA is hybridized to a set of bead-bound probes, one designed against the methylated and one against the unmethylated C at each locus. After hybridization, single base extension with labeled nucleotides incorporates a fluorescent label for detection, thus adding another level of specificity. The Illumina 450k array is a further development of the Illumina 27k platform. Over 480,000 probes cover 99 % of the annotated genes with an average of 17 CpGs per gene. The 450k platform covers 96 % of CGI with additional coverage of CGI shores and CGIs outside coding regions [56]. More than 41 % (>197,000) sites are located in intergenic regions (bioinformatically predicted enhancers, DNase I hypersensitive sites, and validated DMRs) [19••]. Since its introduction in early 2011, 450k technology is now widely used in international large scale epigenomic profiling projects such as “The Cancer Genome Atlas” [57], providing thousands of reference epigenome data sets for normal tissues and multiple cancer types. The 450k array is based on two different assay types to interrogate methylation levels. Type I assays are equivalent to 27k technology and overlap to 90 % with sites covered on 27k arrays, whereas type II assays rely on only one probe per site and discriminate the methylation status by the labeled nucleotide incorporated. Since both types of assay perform differently, care has to be taken to properly normalize methylation levels for comparative results. Also, a fraction of the probes overlaps with single nucleotide polymorphisms (SNPs) thus introducing an analytical bias. Nevertheless, 450k technology allows fast and cost-efficient genome-wide analysis of DNA methylation, requires relatively small amounts of input DNA (0.25–0.5 μg), is compatible with DNA isolated from archival samples including FFPE tissues, and can be processed in a high-throughput manner [19••].

In a small pilot study in cooperation with the University of Milan (Italy), we used 450k technology to interrogate DNA methylation changes in peripheral blood mononuclear cells (PBMCs) of three young healthy smokers who participated in a short 10-day intervention with steam-cooked broccoli (250 g/day) to modulate smoking-associated oxidative stress. The study was placebo-controlled, with a 20-day washout period between placebo- and broccoli-diet interventions. Blood was taken before and after both intervention periods, providing four DNA samples per volunteer over a period of about 6 weeks (study details in [58]). All 12 DNA samples were analyzed on one 450k bead chip, thus avoiding problems associated with batch affects. Over the study period, PBMC methylation was very stable (example in Fig. 2). Significant changes in DNA methylation in samples from before and after broccoli intervention larger than 5 % were infrequent and mainly limited to single CpG sites; no CGI and only a few annotated promoters and genes fulfilled these selection criteria. The functional relevance of these single CpG methylation changes still needs to be tested.

Recently, epidemiological studies have identified methylation changes in PBMCs as markers of smoking status [59, 60]. Methylation at none of these reported marker sites was significantly changed by the short dietary broccoli intervention. However, we observed differences between the three subjects in basal levels of methylation, for example, at the intergenic CpG site cg23576855 (indicated by an arrow in Fig. 2) associated with AHRR (aryl hydrocarbon receptor repressor). This was due to the presence of a SNP at this position. This example highlights that sites associated with SNPs should be excluded before performing DMR detection in genome-wide analyses. Also, when using PBMCs to identify physiological relevant changes in DNA methylation, care should be taken to correct for differences in white blood cell composition, which might confound the results of methylation analyses [61, 62]. Bioinformatic algorithms have been developed to correct methylation levels depending on potential differences in blood composition [6365].

Genome-Wide Methylome Analyses Performed With Dietary Agents

So far, only few in vitro and one in vivo study have addressed the question of genome-wide DNA methylation changes after intervention with cancer preventive dietary agents (Table 2).

Table 2 Genome-wide methylation profiling with cancer preventive agents

The yellow pigment curcumin (diferuloyl methane) found in turmeric (Curcuma longa) is a major ingredient of the spice curry and a well-characterized dietary chemopreventive agent [66]. In order to investigate whether modulation of DNA methylation might be involved in the colon cancer preventive mechanism of curcumin, Link et al. performed genome-wide profiling of methylation changes in three colon cancer cell lines using 27k technology. Short-term treatment for 6 days did not induce DNA methylation changes >10 %. However, long-term intervention with 7.5 or 10 μM curcumin for 240 days resulted in prominent modulation of methylation at 814 to 3051 individual CpG sites. Curcumin was most effective at CpG sites with intermediate methylation levels, with about equal distribution of hyper- and hypomethylation. Sixty-eight loci were hypomethylated in all three cell lines. Methylation changes correlated with changes in gene expression, enriched for genes involved in cell metabolism, cell signaling, cell proliferation, and cell, tissue, and cancer development. The authors concluded that long-term silencing of transcription factor-mediated signaling and subsequent passive gain of methylation rather than a direct effect on DNMTs might underlie the observed methylation changes [44].

Epithelial-to-mesenchymal transition (EMT) is an essential process for development. However, cancer cells can abuse this process to increase invasiveness and motility, ultimately leading to metastasis [67]. Phillip et al. aimed to analyze the influence of genistein, a major soy isoflavone, on genome-wide methylation in cancer cells prior to and after undergoing EMT. Chemopreventive activity of genistein and other soy isoflavones (review in [68]) has been shown to involve epigenetic gene regulation by modulation of histone modifications, microRNAs expression and DNA methylation in breast, prostate and all other major tumor types, at least in vitro (review in [43]). Prostate cancer cell lines representing epithelial or mesenchymal phenotypes were treated with 20 μM genistein for 6 days prior to analysis of methylation changes by 27k arrays. No significant methylation changes larger than 20 % were observed after treatment. Comparative analysis of selected candidate genes confirmed these negative results [69].

A purified soy extract was investigated by Li et al. for its potential to alter DNA methylation in C4–2B and LNCaP prostate cancer cell lines. Cells were treated for 5 days with the extract at a dose equivalent to 20 μM genistein. Genome-wide methylation analyses were conducted by 450k analyses; however, these data are not provided. The authors reported that microRNAs miR-29a, and miR-1256, which are silenced in prostate cancer by promoter methylation, were demethylated and re-expressed by the soy extract. Conversely, expression of TRIM68 (tripartite motif containing 68), a ubiquitin E3 ligase upregulated in prostate cancer that acts as a co-activator of the androgen receptor and was identified as a target of both miRs, was repressed by the extract. This study demonstrates that chemopreventive compounds can concomitantly target multiple epigenetic mechanisms.

Sulforaphane (SFN) and 3,3′-diindolylmethane (DIM) are cancer preventive agents derived from Cruciferous vegetables such as broccoli [70]. Sulforaphane is a reactive isothiocyanate with broad-spectrum chemopreventive activities [71, 72]. DIM is formed under low pH conditions, as in the stomach, from indole-3-carbinol, the main hydrolysis product of the glucosinolate glucobrassicin [71, 73]. Wong et al. investigated by MeDIP-Chip experiments genome-wide effects of SFN and DIM on promoter methylation in normal prostate epithelial cells (PrEC) and the prostate cancer cell lines LNCaP and PC3 [74]. Cells were treated with the compounds at a concentration of 15 μM for 3 days. Both interventions induced widespread promoter hypo- and hypermethylation in all three cell lines. Distinct gene sets were affected by the treatments in each cell line, but within one cell line, methylation changes induced by the two compounds largely overlapped. Promoter methylation of >1000 genes that were dysregulated in LNCaP vs. PrEC cells was normalized by SFN and DIM treatment. Mechanistically, both compounds reduced the expression of DNMTs. Based on these data, it will be interesting to demonstrate similar broad and complex effects on DNA methylation profiles in in vivo models.

Green tea polyphenols (GTP) with (-)-epigallocatechin gallate (EGCG) as the major catechin belong to the best characterized cancer chemopreventive agents. Tea and tea constituents act by a broad spectrum of anti-carcinogenic activities and were reported first to affect DNA methylation in 2003 (reviewed in [3, 5]). EGCG and GTP demonstrate convincing cancer preventive efficacy in animal models (review in [75, 76]), including the “Transgenic Adenocarcinoma of the Mouse Prostate” (TRAMP) model [7780]. In a study published in 2009, Morey-Kinney et al. analyzed the impact of green tea extract intervention on prostate carcinogenesis and genome-wide methylation changes in TRAMP mice [81]. Unexpectedly, different from previous reports, GTP (0.3 % in drinking water) did not prevent the development of prostate tumors. Wild-type (WT) and TRAMP mice were exposed to water or water supplemented with 0.3 % GTP starting from 4 weeks of age up to 24 weeks. Global methylation (5mC) levels were quantified by liquid chromatography coupled with mass spectrometry (LC-MS) in the gut, liver, and prostate of WT mice and prostate tumors of TRAMP mice. About 3–4.5 % of all Cs were methylated, and 5mC levels did not differ between intervention groups, except for a significant 0.5 % reduction upon GTP treatment in livers of WT animals at 12 weeks of age. Quantitative methylation analysis by EpiTyper MassArray of four selected candidate genes known to become hypermethylated in prostate tumors of TRAMP mice revealed age-specific changes, but no influence of the GTP intervention. Global methylation levels were also not affected in TRAMP mice by dose-dependent intervention with 0.1, 0.3, and 0.6 % GTPs from 6 to 18 weeks of age. In order to determine methylation changes at a genome-wide level, the HELP (HpaII tiny fragment enrichment by ligation mediated PCR) assay was applied to one selected TRAMP and WT prostate per group (at age 24 weeks), with and without 0.3 % GTP treatment. This assay detects relative methylation changes at about 1 M loci in the mouse genome. GTP induced both hyper- and hypomethylation when compared within each mouse strain; however, the changes were not concordant between WT and TRAMP samples. Unfortunately, data were not confirmed by quantitative methods, and with only one sample analyzed per group, this study lacked statistical power to identify significant methylation changes.

Summary and Conclusions

The present perspective aims to give an overview of current methodology for assessing DNA methylation at genome-wide scale and to summarize practical experience with applications for methylome profiling in two breast cancer-related projects and dietary intervention studies. The tool set for genome-wide methylation analysis ranges from enrichment-based methods with array- or sequencing-based detection to WGBS that allows species-independent detection of nearly all CpGs in a genome at single CpG resolution, at the expense of high costs.

Selection of the most suitable methodology strongly depends on the research question and the expected degree in methylation changes. Cancer development is associated with continuous changes in DNA methylation at CGIs and core promoter regions. Each of the described methods will accurately detect these extensive methylation differences, which can be as high as 60–80 % when comparing tumor samples and normal tissues. Regions outside of core regulatory regions recently revealed to be important for gene regulation, including intra- and intergenic enhancer regions and areas affected by larger scale methylation changes, such as partially methylated domains and DNA methylation valleys (for example, in [82]) are only partly covered on available array platforms. Sequencing-based methods including WGBS would therefore be the preferred methodology, but are bioinformatically more demanding than array-based methods. Enrichment-based methods are sensitive, but strongly depend on the protocol used for elution of enriched DNA fragments, and the envisaged saturation and coverage will determine required reads. Also, they can be biased by CpG density and copy number alterations.

From what we know so far, short-term dietary intervention in healthy subjects will induce maximally around 10 % change in methylation (when analyzing mixed cell populations). Therefore, unequivocal detection at a genome-wide scale can be challenging. Bisulfite conversion-based methods provide quantitative readout and have low sample requirements, therefore allowing pre-selection of most interesting cell populations to enhance detection sensitivity. For human intervention studies, Illumina 450k arrays offer a good compromise of costs, bioinformatic demands and genomic coverage and have been shown to provide reliable data highly correlated with results obtained with alternative quantitative methods. Information on predefined CpG sites allows straightforward comparison between samples within one study, but eventually also between studies and research groups. Also of interest, large datasets for various normal tissues and cancer entities are publicly available for comparison, for example, from TCGA [57]. For rodent studies or cross-species comparison, RRBS provides equivalent data as 450k analyses, with high coverage of genomic regions with intermediate-high CpG density. With respect to efficient use of resources, one might consider performing preliminary WGBS analyses with a limited number of samples (minimum n = 3) to identify DMRs with the highest possible resolution and sensitivity, and follow up on larger sample sets with quantitative methods such as EpiTyper MassArray or pyrosequencing. Evidently, however, such small-scale analysis looses statistical power to detect methylation differences and is biased by the selected sample set.

In order to obtain informative results from intervention studies, not only the choice of methodology but also the study design should be carefully considered. DNA methylation is intricately involved in developmental processes. It has been speculated that dietary cancer preventive agents might function by induction of epigenetic reprogramming during development. Therefore, studies with interventions starting early in life, covering developmentally critical time windows, might provide more meaningful outcome than studies with interventions in adult animals or adult human subjects. When early intervention is not feasible, interventions for extended periods of time will have larger impact on the epigenome than short-term treatments for few weeks.

With a series of complementary technologies for genome-wide methylation analysis now at hand, future research will have to focus on integration of effects on various epigenomic mechanisms with gene expression and the link to disease outcome to identify best strategies for dietary intervention targeting the epigenome.