Keywords

Introduction

DNA methylation plays a critical role in the regulation of gene expression during development and differentiation. Global hypomethylation and regional hypermethylation at specific gene promoters are observed in many cancers (Kelly et al. 2010). Aberrant DNA methylation has also been described in many other diseases such as neurological, autoimmune, and metabolic diseases (Wen et al. 2016; Zhang and Zhang 2015; de Mello et al. 2014). In mammals, DNA methylation primarily occurs at the carbon 5 position of the cytosine of the CpG dinucleotide and produces 5-mC. Whole-genome and reduced representation bisulfite sequencing has up to now been the most common method of genome-wide DNA methylation profiling (Stirzaker et al. 2014). These NGS-based methods work for any species if genomic sequence information of the species is available. Illumina HumanMethylation BeadChip is an alternative method of DNA methylation profiling of human samples. While this array-based method does not interrogate DNA methylation as comprehensively as whole-genome bisulfite sequencing, it has two key advantages over the NGS-based methods: cost-effectiveness and high sample throughput. Such features have made HumanMethylation450 BeadChip and its successor HumanMethylationEPIC the first choice for large-scale epigenome-wide association studies in which hundreds or thousands of human samples are enrolled. This chapter outlines the basic principles and the experimental and analytical procedures of the HumanMethylation BeadChip and covers strengths and limitations of this array-based platform that are widely recognized by the epigenetics research community.

Basic Principles Underlying the Infinium Methylation Assay

Illumina’s bead technology (Gunderson et al. 2004) enables highly multiplexed measurement of DNA methylation at individual CpG locus in the human genome. HumanMethylation BeadChip measures DNA methylation levels at the cytosine of CpG sites using quantitative genotyping of C/T polymorphisms in bisulfite-converted genomic DNA followed by whole-genome amplification: unmethylated cytosines are converted to uracils (and amplified as thymines), and methylated cytosines are protected and detected as cytosines (Hayatsu 2008).

HumanMethylation BeadChip uses two assay designs, Infinium I and Infinium II (Fig. 1). Infinium assays consist of the following three major steps: hybridization of the bisulfite-converted and amplified DNA fragments to 50-mer oligonucleotides attached to each bead, single-base extension at the 3′ terminus of the oligonucleotide using fluorescently labeled ddNTPs, and the measurement of fluorescent intensity for each bead. The Infinium I assay uses two bead types per CpG locus: one for measuring the methylated state and the other for the unmethylated state. The 3′ terminus of 50-mer probes is designed to match the interrogated site of either cytosine or thymine. A single-base extension occurs at the base immediately adjacent to the interrogated CpG site. Probes for Infinium I assays are designed on the assumption that methylation is regionally correlated within a 50 base interval. This “comethylation” assumption adopted by Illumina is based on the results of a large-scale bisulfite sequencing study showing that >90% of CpG sites within 50 bases had the same methylation status (Eckhardt et al. 2006) and of another study demonstrating that the methylation status of adjacent CpG sites tends to be correlated (Shoemaker et al. 2010). The Infinium II assay requires only one bead type per CpG locus (Fig. 1). The 3′ terminus of 50-mer probes is cytosine complementary to the guanine of the query CpG site. A single-base extension of labeled G or A base occurs, depending on the complementarity to either the methylated C or unmethylated T. Illumina determined that Infinium II probes can contain up to 3 CpG sites within a 50-mer probe sequence and that the underlying CpG sites may be represented by “degenerate” R-bases (A or G base). This feature makes it possible to assess the methylation status of a query site independently of assumptions on the status of neighboring CpG sites and to increase the number of CpG sites interrogated. While the Infinium II design was applied whenever possible, the Infinium I design was required to measure the DNA methylation levels of CpG sites within CpG-rich regions (i.e., CpG islands).

Fig. 1
figure 1

Infinium assay designs. A figure in an Illumina’s technical note (http://www.illumina.com/documents/products/technotes/technote_hm450_data_analysis_optimization.pdf) was reproduced with slight modifications. Bisulfite-converted DNA and interrogated CpG sites are indicated by green lines and the boxes outlined in red dots, respectively. DNP and biotin conjugated to ddNTPs are shown as red and green circles. One of the four types of ddNTPs (DNP-labeled ddATP/ddTTP and biotin-labeled ddCTP/ddGTP) is incorporated at the 3′ end of the probe oligonucleotides when the last two bases of the probe oligonucleotides are complementary to the interrogated CpG site in bisulfite-converted DNA. In the Infinium I assay design, single-base extension occurs at the base adjacent to the interrogated CpG site. The same type of dideoxynucleotide is incorporated in both the unmethylated and methylated loci. However, in the Infinium II assay design, the single-base extension occurs at the C/T nucleotides of the interrogated CpG site. Either DNP-labeled ddATP or biotin-labeled ddGTP is incorporated depending on the methylation state of the interrogated CpG site. Incorporated ddNTPs are immunofluorescently detected using Cy5-labeled anti-DNP and Cy3-labeled streptavidin followed by signal amplification. Therefore, while the same color channel (red or green) is used to measure the fluorescent intensities for the unmethylated and methylated status of the interrogated CpG site in the Infinium I assay design, in the Infinium assay II design, two color channels are used, red for the unmethylated status and green for the methylated status of the interrogated CpG site. Dye bias between two color channels can be adjusted using signal intensities of built-in control probes

Manufacture and Decoding of BeadChip Arrays

BeadChip technology is based on 3 micron silica beads that self-assemble in microwells with a uniform spacing of approximately 5.7 microns on planar silica slides (Fan et al. 2005). Each bead in the Infinium assay holds hundreds of thousands of copies of a specific oligonucleotide comprising an address code and a 50-mer probe sequence (Fig. 2). All types of beads are pooled and randomly assembled in the microwells by Van der Waals forces and hydrostatic interactions with the walls of the well on the array slide. The arrays are subsequently subjected to the decoding procedure, which determines the positions of all types of beads in each array (Fan et al. 2005). Sequential hybridizations of fluorescently labeled decoder oligonucleotides (with four distinguishable labels) to the address code of the beads impart a color signature specifying a bead type to each of the beads on the array. The information of the bead locations for each array is provided as a DMAP file. In the case of HumanMethylation BeadChip, a median of 14 beads is randomly distributed on the array. The presence of multiple beads of each type provides quantitative accuracy through multiple measurements with statistical processing. The random distribution of the beads increases assay robustness by minimizing the chance of any local failure affecting the overall result (Gunderson et al. 2004). In the randomly assembled arrays, the number of each type of bead is a random variable with a Poisson distribution. When the number of a bead is less than three (i.e., 0, 1, or 2), the methylation beta value is not calculated for the corresponding CpG site by the GenomeStudio software (Illumina). In the case of HumanMethylation450 BeadChip, several hundred sites (out of >480,000 CpG sites) per sample inevitably lack the methylation beta value due to the bead number threshold.

Fig. 2
figure 2

Manufacture of BeadChip arrays. (a) Illumina’s BeadChip technology uses 3 micron silica beads attached to hundreds of thousands of copies of a specific oligonucleotide comprising an address code and a 50-mer probe sequence (depicted in orange and blue, respectively). (b) In the case of HumanMethylation450 BeadChip, over 620,000 types of beads are generated, pooled, and self-assembled onto the array. On average, 14 copies of each bead type are present on an array. Each array is subjected to the decoding procedure, which determines the positions of each type of beads for each array (Gunderson et al. 2004)

Experimental Procedures

Experimental procedures using HumanMethylation BeadChip include sodium bisulfite conversion of genomic DNA (day 1), whole-genome amplification (day 2) followed by DNA fragmentation, hybridization of bisulfite-converted and amplified DNA to BeadChip (day 3), and washing, staining, and scanning (day 4) (Carless 2015). The staining procedure includes a single-base extension using labeled ddNTPs (DNP-labeled ddATP and ddTTP and biotin-labeled ddCTP and ddGTP), fluorescent detection of DNP and biotin with Cy5-labeled anti-DNP and Cy3-labeled streptavidin, and signal amplification. The stained BeadChip is scanned using iScan or HiScan SQ system (Illumina) to measure the intensities of methylated and unmethylated signals for each of the target CpG sites. In this scanning procedure, decoded data (DMAP) files containing bead-type information for each position on the BeadChip are required to obtain raw data files (IDAT files). The GenomeStudio software (methylation module) calculates the methylation level of individual CpG sites from IDAT files as beta value ranging from 0 (completely unmethylated) to 1 (completely methylated).

HumanMethylation27, 450, and EPIC BeadChip Arrays

HumanMethylation27 (HM27) BeadChip, the first version of HumanMethylation BeadChip, was released in 2008 and contained Infinium I probes for 27,578 CpG sites located within the promoter regions of 14,475 consensus coding sequences (CCDS) genes and well-known cancer genes (Bibikova et al. 2009). HumanMethylation450 (HM450) BeadChip was released in 2011. HM450 was designed to include 485,577 assays (482,421 CpG sites, 3091 non-CpG sites, and 65 random SNPs) containing 25,978 (94%) of CpG sites present on HM27. The new content covered a more diverse set of genomic features: three CpG island (CGI)-related categories, CGI, shore (0–2 kb from CGI), and shelf (2–4 kb from CGI), and six gene feature categories, namely, TSS200 (from transcription start site (TSS) to −200 nt upstream of TSS), TSS1500 (−200 to −1500 nt upstream of TSS), 5′UTR, first exon, gene body, and 3′UTR, which were introduced to probe annotation. HM450 provides coverage of a total of 21,231 out of 21,474 UCSC RefGenes (99%) and 26,658 (96%) CGIs (Bibikova et al. 2011). This improved coverage of HM450 along with the advantages of the HumanMethylation BeadChip platform, time and cost-effectiveness, and high sample throughput resulted in the use of HM450 in many studies of various types. Examples of the HM450 studies include those of genomic imprinting (Court et al. 2014), X inactivation (Cotton et al. 2015), myoblast differentiation (Miyata et al. 2015), aging (Florath et al. 2014), and many types of cancers. Examples of EWAS, in which the DNA methylation profiles of over a thousand subjects were obtained by HM450, are those for rheumatoid arthritis (Liu et al. 2013), metabolic traits (Dick et al. 2014, Petersen et al. 2014), cardiovascular diseases (Rask-Andersen et al. 2016), and diabetes (Florath et al. 2016). Other examples of large-scale projects that used HM450 are the ARIES study, a population-based methylome project that profiled 1000 mother and child pairs (Relton et al. 2015), and the Cancer Genome Atlas Consortium (TCGA, http://cancergenome.nih.gov/), which profiled about 8000 samples from over 200 cancer types.

DNA methylation at distal regulatory elements, particularly enhancers and transcription factor binding sites, has been shown to be dynamically regulated among different tissues and cell types (Ziller et al. 2013; Gu et al. 2016). The latest version, HumanMethylationEPIC BeadChip, was released in 2015. EPIC covers >850,000 CpG sites including >90% of HM450 content and 413,743 additional CpG sites. The newly added CpG sites include 35,000 CpG sites located at potential enhancers identified by the FANTOM5 project (http://fantom.gsc.riken.jp/5/) and the ENCODE project (http://www.encodeproject.org/). Pidsley et al. (2016) reported that EPIC probes cover 58% of FANTOM5 enhancers and 7% distal and 27% proximal ENCODE regulatory elements and that a single EPIC probe does not always represent the methylation level of distal enhancer elements, which tend to show variable methylation levels across a region. However, because of the substantial increase of coverage of regulatory regions along with the advantages it has inherited from HM450, EPIC is expected to maintain its popularity as an EWAS platform. The major features of HM27, HM450, and EPIC BeadChip arrays are summarized in Table 1. The genomic locations of the CpG probes contained in HM27, HM450, and EPIC BeadChip arrays are shown for a 36 kb genomic interval including the TAL1 gene (Fig. 3) to exemplify the difference in probe distribution and density among the three types of HumanMethylation BeadChip arrays.

Table 1 Comparison of probe content and the number of use of HM27, HM450 and EPIC
Fig. 3
figure 3

Distribution of CpG probes included in three types of HumanMethylation BeadChip arrays within a 36 kb genomic interval. The genomic interval of chr1:47,674,001–47,710,000 (hg19) including the TAL1 gene is shown using UCSC Genome Browser (http://genome.ucsc.edu/) (Kent et al. 2002). The locations of the CpG probes contained in HM27, HM450, and EPIC BeadChip arrays within this interval were shown as vertical black lines by uploading the genomic positions of the probes in the BED (Browser Extensible Data) format using the custom track tool of the Browser. The distribution of the CpG probes in this interval exemplifies the wider coverage of enhancer regions, which are positive for H3K4Me1 signals, by EPIC probes than by HM450 probes

Data Analysis

This section describes the chief data processing procedures generally recommended for an HM450/EPIC dataset: probe filtering, within-array normalization, and between-array normalization (batch effect correction). These procedures, if correctly conducted, minimize variance and improve statistical power for detecting small DNA methylation changes. Several bioconductor or R packages that perform all or most of the procedures are available for free (Table 2) including methylumi, minfi, waterRmelon, ChAMP, and RnBeads, which have been reviewed by Morris and Beck (2015). Jllumina is an open-source Java library recently released for the handling and processing of HM450/EPIC raw data (Almeida et al. 2016) (Table 2) and forms the backbone of DiMmer (Almeida et al. 2017), a graphical user interface software that interactively guides EWAS data analysis procedures. Although Illumina’s GenomeStudio is useful for analyzing a small number of samples, its functions are limited compared to the software packages listed above.

Table 2 Features of data analysis packages for HumanMethylation BeadChip data

Step 1: Filtering Problematic Probes

Removal of probes with high detection p-values (probes that have failed to hybridize) is generally recommended. The detection p-value is defined as “1-R/N,” where R is the relative rank of the signal intensity relative to the negative controls and N is the number of negative controls (N = 614 in the case of HM450). Illumina recommends the removal of probes with a detection p-value >0.05. In addition, removing probes with missing beta values, probes containing common SNPs, and cross-reactive probes should also be considered. The Infinium methylation assay is based on the quantitative genotyping of C/T polymorphisms generated at CpG sites after bisulfite treatment. When C/T SNP exists at the target CpG sites, the genomic TG allele behaves as unmethylated CpG. Therefore, probes with target CpG sites overlapping a common SNP (minor allele frequency > 0.05) should be treated with caution in studies enrolling multiple human subjects. HumanMethylation BeadChip relies on the hybridization of bisulfite-treated DNA to 50-mer oligos attached to the beads. Bisulfite conversion of the majority of Cs to T generates “three-letter genome sequences,” increasing the probability of probe cross-reactivity, i.e., hybridization of a probe to loci other than its primary target. Depending on the criteria used, 8–25% of the HM450 probes were classified as cross-reactive (Price et al. 2013; Chen et al. 2013). Therefore, when differentially methylated probes are extracted by the comparison of HumanMethylation data between groups of interest without prior exclusion of SNP-containing and cross-reactive probes, it is important to consider the possibility that such problematic probes may have been falsely identified as differentially methylated.

Step 2: Within-Array Normalization

Removing variations of a technical, rather than a biological, origin is crucial. Within-array normalization is performed for background correction, dye-bias adjustment, and type II bias adjustment (Yousefi et al. 2013; Dedeurwaerder et al. 2014). Type II bias is the foremost issue originating from the two different assay designs, Infinium I and Infinium II, used in HM450 and EPIC. The methylation values derived from these two designs were shown to exhibit different distributions. Infinium II probes showed a lower dynamic range compared with Infinium I probes (Dedeurwaerder et al. 2011; Bibikova et al. 2011). Furthermore, methylation values of the Infinium II probes were shown to be biased and less reproducible in comparison with bisulfite pyrosequencing data (Dedeurwaerder et al. 2011). This issue has led to the development of a variety of algorithms to adjust for type II bias (reviewed in Morris and Beck 2015).

Step 3: Between-Array Normalization

Batch effects represent nonbiological, technical variations observed across different batches of experiments and are common among high-throughput data including HumanMethylation BeadChip data (Leek et al. 2010; Sun et al. 2011). Differences in experiment dates, reagent lots, laboratory conditions, and personnel can cause batch effects. In large-scale studies, in which BeadChip experiments need to be performed with many arrays at different times, batch effects may not always be avoidable. Batch effects are also expected when data from multiple institutes are analyzed together. Several algorithms are available for batch effect correction for HumanMethylation BeadChip data including ComBat, surrogate variable analysis (SVA), RUVm, functional normalization, and BEclear (see the references in Akulenko et al. 2016).

Lie and Siegmund (2016) evaluated the effects of nine processing methods including both within- and between-array normalization methods and found that within-array normalization using Noob (Triche et al. 2013) and BMIQ (Teschendorff et al. 2013) consistently improved signal sensitivity and that RUVm (Maksimovic et al. 2015) outperformed the other batch effect correction algorithms in a combination analysis using Noob and BMIQ. Inclusion of batch effect correction depends on the degree of methylation differences expected between the groups compared; the process is facilitated if variations stemming from a technical origin outnumber those stemming from biological signals, but in the opposite situation, some batch effect correction tools may remove biological signals (Liu and Siegmund 2016).

In addition to the options for probe filtering and within- and between-array normalization, many of the current software packages (Table 2) include options for correcting for cell heterogeneity, another potential confounding factor associated with methylome studies using human population samples. Whole blood is frequently chosen as a source of genomic DNA in EWAS. Because blood is a heterogeneous collection of different cell types, the associations detected in such studies may be confounded by cellular heterogeneity (Jaffe and Irizarry 2014).

After appropriate data preprocessing, differentially methylated probes (DMP) or regions (DMR) between groups of interest can be searched. Probe annotations such as those of Price et al. (2013) and Zhou et al. (2017) are useful for the biological interpretation of identified DMPs/DMRs. The functional annotation provided by Zhou et al. (2017) includes the positions of HM27/HM450/EPIC probes relative to imprinted differentially methylated regions (Court et al. 2014), transcription factor binding sites identified by the ENCODE project, and chromatin states identified by the Roadmap Epigenomics Project (http://www.roadmapepigenomics.org/).

Conclusions

The HumanMethylationEPIC array represents a significant improvement over the HM450 array in terms of genome coverage of regulatory regions, consistent reproducibility, and reliability. Therefore, this platform will serve as a valuable tool in large-scale human methylome studies requiring identification of disease biomarkers and/or etiology and assessment of the impact of environmental exposures. Many integrated analysis packages have been developed as computational solutions for technical confounders originating in the underlying principle and design of HumanMethylation BeadChip as well as biological cofounders inherent in human biology.

Dictionary of Terms

  • Methylome – The genome-wide profile of DNA methylation.

  • EWAS – Epigenome-wide association study for uncovering epigenetic variants underlying common diseases.

  • Bisulfite conversion – Sodium bisulfite conversion of genomic DNA to differentiate and detect unmethylated versus methylated cytosine.

  • Infinium assay – Illumina’s SNP genotyping technology based on on-bead single-base extension and fluorescent detection.

  • Methylation beta value – Beta = methylated intensity/ (unmethylated intensity + methylated intensity +100).

  • Detection p-value – Threshold value for filtering out probes that failed to hybridize in HumanMethylation BeadChip experiments.

  • Batch effects – Nonbiological experimental variation observed across multiple batches of microarray experiments.

Key Facts of HumanMethylation BeadChip

  • HumanMethylation BeadChip is an array platform for highly multiplexed measurement of DNA methylation at individual CpG loci in the human genome based on Illumina’s bead technology.

  • The platform measures the DNA methylation level of individual CpG sites by quantitative genotyping of C/T polymorphisms generated in bisulfite-converted and amplified genomic DNA.

  • The current version, HumanMethylation EPIC, measures the DNA methylation level of >850,000 CpG sites, while previous versions, HumanMethylation450 and HumanMethylation27, measured that of >480,000 and >27,000 CpG sites, respectively.

  • HumanMethylation BeadChip requires only 4 days to produce methylome profiles of human samples using 250–500 ng of genomic DNA as a starting material.

  • Because of its time and cost efficiency, high sample output, and overall quantitative accuracy and reproducibility, HumanMethylation450 was the most widely used means of large-scale methylation profiling of human samples in recent years.

  • Considering the substantial increase of the coverage of regulatory regions along with the advantages inherited from HumanMethylation450, HumanMethylation EPIC is expected to maintain its popularity as a platform for epigenome-wide association studies for the foreseeable future.

Summary Points

  • Illumina HumanMethylation BeadChip is a popular platform for obtaining DNA methylome data from human samples.

  • The current version, HumanMethylationEPIC, measures the DNA methylation levels of over 850,000 CpG sites by highly multiplexed quantitative genotyping of C/T polymorphisms in bisulfite-converted genomic DNA.

  • HumanMethylation450 was very widely used in EWAS projects, and its successor, EPIC, is expected to remain the preferred tool for the foreseeable future in studies with large sample numbers due to its cost and time effectiveness and high sample throughput.

  • Considering potential confounders originating in the technical limitations of HumanMethylation BeadChip such as cross-reactive probes, SNP-affected probes, within-array bias (Infinium I and II bias), and between-array bias (batch effects) is of particular importance, especially when subtle methylation differences need to be detected by statistical tests between large numbers of cases and controls.

  • Many integrated analysis packages have been developed by the epigenetics research community as computational solutions for technical and biological confounders associated with HumanMethylation BeadChip data.