Keywords

In 1959, Waddington first coined the concept “epigenetics” (Waddington 1959), which now refers to the mechanism for stable maintenance of gene expression changes that involves physically “marking” DNA or its associated proteins other than alterations in DNA sequence.

A variety of epigenetic factors have been identified, such as DNA methylation, histone modification, and non-coding RNAs (e.g., microRNAs and long non-coding RNAs) etc. These epigenetic factors coordinatively regulate gene expression and provide heritable epigenetic information that is not encoded in DNA sequence (Cedar and Bergman 2009; Esteller 2011). Epigenome refers to the entire constitution of epigenetic marks in a cell type at a given time point, it is cell-specific and tissue-specific (Varley et al. 2013). In each type of cells, epigenetic factors regulate gene expression in different ways, for example, facilitate or restrict transcription factor access to DNA sequence (Rivera and Ren 2013). Epigenome may change over the lifetime (Fraga et al. 2005a) and are prone to environmental influences, such as stress, social interactions, physical activity, exposure to toxins and diet (Alegria-Torres et al. 2011). Aberrant epigenomic alternations have been implicated in a wide variety of human disorders, such as cancer and autoimmune diseases etc. (Portela and Esteller 2010), and epigenetic drugs may revolutionize the treatment of many human diseases (Heerboth et al. 2014). In this chapter, we briefly reviewed the molecular basis for two major epigenetic factors, DNA methylation and histone modification, and discussed some commonly used epigenome-wide analytic approaches for these two factors, as well as their involvement in some human complex disorders.

10.1 Molecular Basis of DNA Methylation and Histone Modification

10.1.1 DNA Methylation

DNA methylation, commonly called the ‘fifth base’ in the genome, is one of the most extensively studied epigenetic mechanisms. It is a direct chemical modification of the fifth carbon of a cytosine that adds a methyl (-CH3) group through a covalent bond resulting in 5-methylcytosine (5-mC).

In adult somatic tissues, DNA methylation typically occurs in a “CpG” (C-phosphate-G) dinucleotide context (Bird and Southern 1978; Cedar et al. 1979). An exception to this is seen in embryonic stem cells (Haines et al. 2001), where a substantial amount of 5-mC is also observed in non-CpG sites (mCHG, mCHH). In human genome, there are 28 million CpG sites, which are not evenly distributed throughout the genome (Lister et al. 2009) but tend to cluster in regions, known as “CpG islands” (CGIs). CGIs usually occur near gene transcription start sites (TSS) and ~60 % of human gene promoters are associated with CGIs (Bird 1986; Gardiner-Garden and Frommer 1987). DNA methylation is catalyzed by a family of enzymes termed DNA methyltransferases (DNMTs) (Okano et al. 1998), including DNMT1, DNMT3A, DNMT3B, DNMT2 and DNMT3L, which cooperate in establishing and maintaining DNA methylation patterns (Kulis and Esteller 2010; Okano et al. 1999). Equally important and opposite with DNA methylation is DNA demethylation. DNA demethylation can be either passive or active, or a combination of both. Passive DNA demethylation usually takes place when DNMT1 cannot effectively restore the DNA methylation patterns on newly synthesized DNA strands during replication rounds (Wu and Zhang 2010), whereas active demethylation is usually mediated by the ten-eleven translocation (TET) family enzymes (TET1, TET2 and TET3) and subsequent restoration of unmodified cytosine by the thymine DNA glycosylase (TDG)-mediated base excision repair (Kohli and Zhang 2013).

The importance of DNA methylation as a major epigenetic modification in gene expression has been widely recognized. Hypermethylation of CpGs in TSS proximal regions, particularly in promoter CGIs, is largely associated with repressed gene transcription (Wagner et al. 2014), whereas methylation of CpGs located within gene bodies is usually associated with an increase in transcriptional activity (Ramsahoye et al. 2000; Hellman and Chess 2007). However, several recent studies have revealed that there is no simple relationship between inter-individual DNA methylation and gene expression with respect to the location of the methylated CpGs and both negative and positive inter-individual methylation-expression correlations were detected for CpGs located in gene body and transcription start site proximal regions, as well as in intergenic regions (Wagner et al. 2014; Bell et al. 2011).

10.1.2 Histone Modification

The basic unit of chromatin is the nucleosome, which is composed of an octomer of histone proteins (containing two copies each of histones H2A, H2B, H3, and H4) around which is wrapped a length of 147 bp DNA. The degree to which chromatin are condensed or packed is a critical determinant of the transcriptional activity of the associated DNA and this is mediated in part by diverse post-translational covalent modifications of the N-terminal tails of histone proteins (Fig. 10.1). At least eight different types of histone modification have been identified: acetylation, methylation, phosphorylation, ubiquitination, sumoylation, ADP ribosylation, deimination, and proline isomerization. All the modifications are reversible and dynamic, mediated by enzymes that add/remove modification.

Fig. 10.1
figure 1

Post-translational modifications of histones. The first 20 amino acids in the N-terminus of the human histone H4 are illustrated. Many sites in the N-terminus can be targets for epigenetic tagging such as acetylation (A), phosphorylation (P) and methylation (M). Acetylation is catalyzed by histone acetyltransferase (HAT) and removed by histone deacetylase (HDAC); Phosphorylation is catalyzed by protein kinases (PK) and removed by protein phosphatase (PP); Methylation is catalyzed by histone methyltransferases (HMT) and removed by histone demethylase (HDM). Some histone modification marks are associated with gene activation while others are associated with gene repression, and the integration of multiple marks leads to a finely tuned transcriptional response

Histone acetylation occurs via an enzymatic transferring of an acetyl group from acetyl-CoA to the ε-NH+ group of the lysine residues within a histone. This enzymatic activity is catalyzed by enzymes called histone acetyltransferases (HATs) and reversed by histone deacetylases (HDACs) (Hodawadekar and Marmorstein 2007). Histone acetylation is a hallmark of transcriptional activation (Sterner and Berger 2000) and the histone acetylation patterns are tightly associated with many cellular processes including chromatin dynamics and transcription, gene silencing, cell cycle progression, apoptosis, differentiation, DNA replication, DNA repair, nuclear import, and neuronal repression (Cohen et al. 2011).

Histone methylation is another extensively studies histone modification marks. It is defined as the transfer of one, two, or three methyl groups from S-adenosyl-L-methionine to lysine or arginine residues of histone proteins by histone methyltransferases (HMTs). In the cell nucleus, when histone methylation occurs, specific genes within the DNA complexed with the histone may be activated or silenced (Greer and Shi 2012). For instance, the tri-methylation of histone H3 at lysine 4 (H3K4me3) is positively correlated with gene transcription and commonly detected in a tight, localized area at 5′-ends/promoter regions of active genes (Barski et al. 2007). H3K36me3 is strongly enriched across the gene body and at the 3′-end of active genes and may link to transcriptional elongation (Barski et al. 2007). In contrast, H3K27me3 is the classic repressive histone modification mark, which shows a broad peak at promoters and throughout the gene body of the silent genes (Barski et al. 2007).

10.2 Epigenome-Wide Analyses of DNA Methylation and Histone Modification

10.2.1 Epigenome-Wide DNA Methylation Analysis

10.2.1.1 DNA Methylation Profiling Assays

DNA methylation analysis normally relies on three strategies (Fig. 10.2): (1) Digestion of genomic DNA with methylation-sensitive restriction enzymes; (2) Affinity-based enrichment of methylated DNA fragments; and (3) Bisulfite conversion. Each of the three strategies can be combined with either microarray or next-generation sequencing technique to interrogate epigenome-wide DNA methylation patterns, and each with unique advantages and drawbacks (Table 10.1).

Fig. 10.2
figure 2

Strategies for pretreatment of DNA sample. (a) Digestion of genomic DNA with methylation-sensitive restriction enzymes. (b) Affinity-based enrichment of methylated DNA fragments. (c) Chemical treatment of DNA with sodium bisulfite results in the conversion of unmethylated cytosines to uracils. In contrast, methylated cytosines are protected. Subsequently, microarray or next-generation sequencing of these libraries reveals the methylation status

Table 10.1 Comparison of DNA methylation detection procedure
  • Digestion of genomic DNA with methylation-sensitive restriction enzymes: Some restriction enzymes (e.g., HpaII and SmaI) are methylation sensitive – their activity is affected by the presence of a methyl CpG within restriction sites. Therefore, when genomic DNA is digested with a methylation-sensitive restriction enzyme, difference in methylation status is converted into difference in sequence fragment size. For example, differential methylation hybridization (DMH) uses combinations of methylation-sensitive and methylation-insensitive restriction enzyme digestion, followed by ligation-mediated PCR to enrich for methylated or unmethylated fragments. PCR products are labeled and hybridized to arrays, or tested by next-generation sequencing (Fig. 10.2a). Methods based on this strategy were used in early epigenome-wide DNA methylation studies (Rakyan et al. 2011), but the genome-wide CpG coverage and resolution are limited by the cutting frequency and the fragment size of the chosen restriction enzymes.

  • Affinity-based enrichment of methylated DNA fragments: Affinity-based enrichment assays capture methylated DNA fragments with a methyl-CpG binding protein (e.g., MBD2) or 5-mC specific antibody (Fig. 10.2b). For example, when performing methylated DNA immunoprecipitation (MeDIP)-chip/-seq, genomic DNAs are first fragmented by sonication and then incubated with anti-5mC antibody. The anti-5mC bound DNA fragments are isolated, deproteinized, and then hybridized onto microarrays (MeDIP-chip) or analyzed by next-generation sequencing (MeDIP-seq). The frequency of DNA fragments bound to specific probes or mapped to specific genomic regions provides the raw data from which DNA methylation levels can be inferred. The affinity-based DNA methylation assays allow for rapid and efficient genome-wide assessment of DNA methylation, however, as the affinity-captured DNA fragments are generally hundreds of nucleotides in size, the major limitation of these methods is its inability to pinpoint methylation changes at a single CpG resolution (Robinson et al. 2010).

  • Bisulfite Conversion: Bisulfite conversion of DNA is the most commonly used method for DNA methylation studies. It uses bisulfite salt to deaminate cytosine residues on single-stranded DNA, converting them to uracil while leaving 5-methylcytosine intact. Once a difference of methylation status is converted into a difference of DNA sequence, it can be detected by various techniques (Fig. 10.2c). Bisulfite sequencing applies routine DNA sequencing methods on bisulfite-converted genomic DNA. It can provide quantitative methylation measurement at single nucleotide resolution and is widely accepted as a gold standard for DNA methylation analysis. Recent development of next-generation sequencing technology makes it feasible to perform whole genome bisulfite sequencing (WGBS) (Suzuki and Bird 2008). Though WGBS can provide a comprehensive coverage of almost all CpGs in the human genome, its usage is currently limited by its high cost. Thus, several more cost-effective bisulfite conversion-based approaches, such as reduced representation bisulfite sequencing (RRBS) and Illumina 450 k array, are widely employed in the current epigenomics research field. In RRBS, genomic DNA is digested by the methylation-insensitive restriction enzyme MspI (5′-C′CGG-3′) and separated by gel electrophoresis, and then size-selected DNA fragments are bisulfite converted and analyzed by next-generation sequencing platforms (Meissner et al. 2005). After the MspI digestion and size selection, CpG sites were enriched in sequencing library that reduce the amount of nucleotides needed to be sequenced.

10.2.1.2 Data Processing and Analysis

In this section, we will review the data processing approaches for three most popular epigenome-wide DNA methylation profiling methods, namely, Illumina 450k array, RRBS and MeDIP-seq. We also discussed the approaches for data visualization and identification of differential methylation in the following section.

10.2.1.2.1 Data Processing for Illumina 450k Array

The Illumina 450k array adapted BeadArray technology to recognize the bisulfite-converted DNA for interrogation of DNA methylation. It offers a unique combination of comprehensive, expert-selected coverage, high sample throughput and an affordable price, making it the most widely used method for current epigenome-wide association studies (EWAS). The Illumina 450k array tests more than 485,000 CpGs at single-nucleotide resolution, which covers 99 % of RefSeq genes and 96 % of CGIs (Bibikova et al. 2011). The data processing procedures for Illumina 450k array contain several main steps, including quality control (QC), normalization, adjustment of batch effect, and calculation of DNA methylation levels.

  • QC: The aim of this step is to detect and filter out samples and probes that do not meet the experimental standard. The Illumina 450k arrays contain several control probes for determining the data quality. Diagnostic plots of control probes in Illumina Genome-Studio program can be used to detect poorly performed samples (Bibikova et al. 2011). Assessing for poor quality samples can also be carried out by functions embedded in several R-packages specifically for analyzing Illumina 450k arrays, such as HumMethQCReport (Mancuso et al. 2011), IMA (Wang et al. 2012), Minfi (Aryee et al. 2014) and MethyLumi (Davis et al. 2012).

    For QC of probes, some packages such as IMA (Wang et al. 2012) filter out probes for which a large proportion of samples (i.e., >25 %) have a detection P-value >0.05. LumiWCluster avoids to discard probes (Kuan et al. 2010), instead it incorporates all the data while accounting for the quality of individual observations. A particular issue for QC of 450k array is that certain probes contain single nucleotide polymorphisms (SNPs) within the targeted sequences and thus the methylation levels assessed by these probes may be influenced by the DNA genotype (Dedeurwaerder et al. 2011). Hence, several programs (e.g., IMA) have incorporated functions to filter out these SNP-associated CpG probes (Wang et al. 2012; Touleimat and Tost 2012).

  • Normalization: Normalization step is used to remove technical and systematic variation which could mask true biological differences. There are two types of normalization approach: (1) between-array normalization: address the comparability of intensity distribution between multiple arrays; (2) within-array normalization: correction for dye, intensity and spatial dependent bias within individual arrays (Siegmund 2011). The Illumina GenomeStudio uses a basic normalization approach by treating the first sample in the array as the reference but allows the user to reselect the reference sample if the first sample shows poor quality. This approach is also implemented in R-package MethyLumi (Davis et al. 2012) and Minfi (Aryee et al. 2014). Locally weighted scatterplot smoothing (LOESS) and quantile normalization assume similar total methylation signals across samples and may potentially discard the true biological signals (Laird 2010). There also exist several other approaches for normalizing the probe intensities (Marabita et al. 2013), but currently a lack of consensus exists regarding to the optimal normalization algorithm.

  • Adjustment of Batch Effect: Batch effects represent measurements that have different behavior across conditions but are not related to the biological or scientific questions in a study (i.e. experiment time, chip or instrument used and laboratory conditions.). Some of the factors can be corrected by careful study design, for example, equally splitting the cases and controls into different batches by random sampling (Johnson et al. 2007) Other potential confounders may be corrected by several computational methods. For example, R-packages ComBat is a widely used adjustment method. It is based on empirical Bayes procedure (Johnson et al. 2007) and is robust to outliers in small sample sizes (Sun et al. 2011).

  • Calculation of DNA Methylation Levels: DNA methylation levels are determined based on the intensities of the fluorescence signals from probes. The main output is the β-value and M-value which are ready for downstream statistical analysis. The β-value is calculated with the intensity of signal from methylated alleles (Max(M,0)) and the intensity of signal from unmethylated alleles (Max(U,0)) by the following formula:

    $$ \upbeta =\frac{\mathrm{Max}\left(\mathrm{M},0\right)}{\mathrm{Max}\left(\mathrm{M},0\right)+\mathrm{Max}\left(\mathrm{U},0\right)+100} $$

    The obtained β-value denotes the average methylation level for each CpG site. It ranges from 0 (unmethylated) to 1 (fully methylated) on a continuous scale. Alternative, some researchers use M-value to indicate the methylation level, which is calculated as

    $$ \mathrm{M}={ \log}_2\frac{\mathrm{Max}\;\left(\mathrm{M},0\right)+1}{\mathrm{Max}\left(\mathrm{U},0\right)+1} $$

    The range of M-values is negative infinity to positive infinity, which is consistent with data from normal distribution. However, the interpretation is of M-values is not as intuitive as β-value. The relationship of M-values and β-value is:

    $$ \mathrm{M}={ \log}_2\frac{\upbeta}{1 - \upbeta} $$

    Thus, positive M-values correspond to a methylation rate greater than 50 %, while negative M-values indicate a methylation rate less than 50 %.

10.2.1.2.2 Data Processing for RRBS

Processing of RRBS data mainly involves two steps, QC and alignment of sequencing reads.

  • QC: The raw sequencing reads are normally generated in the fastq format, which records the sequence of nucleotides and their base call confidence levels. In order to obtain high quality RRBS data, several technical details require careful attention. For example, the incomplete bisulfite conversion will lead to spuriously elevated DNA methylation levels. One should use spike-in control DNAs with known DNA methylation levels to monitor the sensitivity and specificity of bisulfite conversion. Alternatively, elevated levels of observed CpC methylation can also provide an indication of incomplete bisulfite conversion because CpC dinucleotides are rarely methylated in mammalian cells (Bock 2012). Some of the QC steps for the RRBS data can be performed by QC tools (e.g., NGS QC toolkit) (Patel and Jain 2012) that are generally applicable to the next-generation sequencing produced reads, while other QC criteria such as efficiency of bisulfite conversion require QC tools that are dedicated to bisulfite sequencing, such as BSeQC (Lin et al. 2013).

  • Alignment: Because of the reduced sequence complexity of the bisulfite converted sequence reads, alignment of bisulfite converted sequence reads to the reference genome require specific alignment tools. Generally, the alignment tools can be categorized into two groups: three-letter aligners and wild-card aligners. Bismark (Krueger and Andrews 2011) and BS-Seeker (Chen et al. 2010) are examples of three-letter aligners, which convert C to T in both sequenced reads and reference sequences prior to alignment. In contrast, wild-card aligners like BSMAP/RRBSMAP (Xi and Li 2009; Xi et al. 2012) replace Cs in the sequenced reads with wild-card Y but do not need the reference genome conversion step. Compare with whole-genome bisulfite alignment tool, such as an extensively validated MAQ-based pipeline, these specific aligners (e.g. RRBSMAP) could maintain high mapping accuracy and consistency between replicates, and also significantly improve runtime performance and memory efficiency (Xi et al. 2012).

  • Calculation of DNA Methylation Signals: As unmethylated cytosines will be converted to Ts by the bisulfite treatment and methylated cytosines will stay Cs, absolute DNA methylation level could be calculate by counting the number of Cs and Ts at each C and simply divide the number of Cs by the total number of Cs and Ts.

10.2.1.2.3 Data Processing for MeDIP-Seq

In MeDIP-seq, the information of enrichment or depletion of extended sequencing reads will be used to estimate the methylation level of specific regions in the genome, the reads sequence itself does not provide methylation information. As a result, specific data processing approaches are needed to estimate the DNA methylation levels from MeDIP-seq method.

  • QC and alignment: similar to other sequencing-based methods, the first step in the analysis of MeDIP-/MBD-seq is QC and alignment of sequencing reads to the reference genome, which can be conducted by using a standard quality control program and aligner, such as Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and BWA (Li and Durbin 2009).

  • Estimation of DNA methylation levels: after alignment, the unique mapped reads are then extended to MeDIP-enriched DNA fragment size, the DNA sequence of each chromosome is divided into a series of certain base pair intervals (e.g. 50bp), and the extended reads in each interval are counted as the methylation signal in this region. These estimated DNA methylation signal can be confounded by varying density of methylated CpG sites. That is, regions with high CpG densities can give rise to high enrichment scores even with low absolute DNA methylation levels and low CpG density regions can produce low enrichment scores even with high levels of DNA methylation. Down et al. developed the tool BATMAN which applies a Bayesian method to estimate absolute methylation values from MeDIP-chip or MeDIP-seq data (Down et al. 2008). It provides accurate estimations of methylation value, however it is not especially user-friendly and is quite a computationally technical process. Another tool is R package MEDIPS, it is a comprehensive approach for normalizing and analyzing MeDIP-seq data (Chavez et al. 2010). This method is based on the valuable concept of coupling factors presented by BATMAN (Down et al. 2008). MEDIPS incorporates a statistical frame work developed for count data which models the read number by an overdispersed Poisson model. This method could significantly reduce run time for processing MeDIP-seq data and easy to use.

10.2.1.3 Identifying Differentially Methylated Regions (DMRs)

In clinical study (e.g. case and control study), it is crucial important to identify the DMRs between different experimental condition. There are several different types of DMR, such as tissue-specific DMR and aging-specific DMR (Rakyan et al. 2011). According to DNA methylation profiling methods we use, these DMR can be a single CpG site or a region of interest (e.g. promoters, CGIs). The Student’s t-test and Wilcoxon rank sum test can be used to identify DMRs by using the normalized methylation signal between two groups. Bock, C (Bock 2012) well summarized several other advanced methods which aim to improve DMR detection (e.g. mixture models (Wang 2011), stratification of t-test (Chen et al. 2012) and point out it is difficult to predict which methods will work best for real-world DNA methylation data sets. There are several different tools used for identification of DMRs. For Illumina 450k array, most commonly used tools including R package IMA (Wang et al. 2012) and Minfi (Aryee et al. 2014) etc. The IMA (Wang et al. 2012) apply Student’s t-test and empirical Bayes statistics, it allows identification of DMRs in both single CpG sites and regions of interest. For regions of interest differential methylation analysis, IMA will compute the mean, median or Tukey’s biweight robust average for the loci within that region and create an index. limma uses an empirical Bayes moderated t-test to improve power in small sample sizes. M-values should be used in these cases as they will rely much more heavily on the assumption of normality. Minfi (Aryee et al. 2014) uses an F-test or linear regression to test each genomic position for association between methylation and categorical or continuous phenotype, respectively. R package methylKit (Akalin et al. 2012) is most commonly used tools for RRBS data analysis. It applies a t-test or logistic regression to calculate p-values which are adjusted to q-values for multiple test correction. For MeDIP-seq data, R package MEDIPS is sufficiently fast and could be practical for routine processing of MeDIP–seq (Bock 2012). Importantly, we need to concern the issue of correction multiple hypothesis testing since the tests for differential DNA methylation are performed simultaneously at a large number of genomic loci.

10.2.2 Epigenome-Wide Histone Modification Analysis

10.2.2.1 Histone Modification Profiling Assays

Methods for epigenome-wide analysis of histone modification marks rely heavily on a procedure called chromatin immunoprecipitation (ChIP). The basic steps of ChIP includes: (1) Crosslink DNA and associated proteins on chromatin in cells; (2) Sonicate the DNA-protein complexes into ~500 bp fragments; (3) Immunoprecipitate DNA fragments using specific antibody against the particular histone mark; (4) Purify the immunoprecipitated DNA fragments and subsequently analyze by microarrays (ChIP-chip) or sequencing (ChIP-seq) (Fig. 10.3). To control for the effects of non-specific bindings, nonspecific immunoglobulin G (IgG) antibodies and input chromatin have been commonly used as controls (Kidder et al. 2011). Regions showing enrichment of ChIP products over controls represent DNA sequences where the specific histone modification marks are associated with in vivo. In addition to histone modification marks, the ChIP-chip/-seq methods can also be used to map global binding sites for specific transcription factors, RNA polymerases, or in principle any DNA-associated proteins.

Fig. 10.3
figure 3

Workflow for ChIP-chip and ChIP-seq. DNA and associated proteins are crosslinked and sheared into ~500 bp DNA fragments by sonication or nuclease digestion, DNA fragments associated with the histone mark of interest are selectively immunoprecipitated using an antibody specifically against the particular histone mark. Purified DNA can be analyzed by microarrays (ChIP-chip) or sequencing (ChIP-seq)

10.2.2.2 Data Processing and Analysis

Using standard QC and alignment programs, the high quality sequencing reads from ChIP-seq data can be selected and mapped to the reference genome. The aligned reads are then used to identify regions of increased read tag density relative to the background estimated from the IgG/input controls. One straightforward approach is simply to use a minimum fold enrichment threshold of ChIP tags over normalized control tags in candidate regions/tiling windows. However, any threshold is arbitrary and prone to error, this approach does little to assist the user in assessing the significance of peaks (Wilbanks and Facciotti 2010). More sophisticated statistical approaches have been incorporated to identify and assess the significance of putative peaks (Pepke et al. 2009). So far, over 40 different ‘peak calling’ programs have been developed under a variety of statistical models, such as Poisson, local Poisson, t-distribution, conditional binomial, and hidden Markov models. Though a few studies attempted to compare the performance of some of these peak calling programs (Wilbanks and Facciotti 2010; Micsinai et al. 2012), there does not appear to be a clear winner and many program have multiple parameters that can be adjusted by the user. As using different programs or different parameter settings can significantly affect the final peak lists, care must be taken that data sets that are to be compared must be analyzed using the same methods and settings.

10.2.3 Epigenomic Data Visualization and Interpretation

Visualization of DNA methylation data (e.g. MeDIP-seq, RRBS) and histone modification (ChIP-seq) data is incredibly important. It enables you to investigate the data and may help you come up with new ideas about how to analyze the data. The ability to visualize these kinds of data is enabled through the use of some popular genome browsers, such as UCSC Genome Browser (Kent et al. 2002) and Integrative Genome Viewer (IGV) (Robinson et al. 2011). UCSC Genome Browser includes lots of published studies and ENCODE data, it is useful for data integration and visualization. However, the data files need to be uploaded to this web-based genome browser which makes a little more difficult to upload large custom data sets. IGV is Java based genome browser. It runs locally on your own computer. It does not have the same degree of public available data as UCSC genome browser, but tend to be somewhat faster for browsing across the genome. Also, it is better for looking at individual reads. There are several types of file format, such as BED, Wiggle and bedGraph format. BED files are very basic as they simply describe a region in the genome. They are usually used to describe MeDIP-seq and ChIP-Seq peaks. Nearly every genome browser supports visualization of BED files. Wiggle files are used to display quantitative information across genomic regions. Wiggle format is compact and displays data at regular intervals. Similar to Wiggle format, bedGraph use variable length intervals instead of constant intervals found in wiggle files, and are usually a little bigger in size. There are a bunch of specialized programs for creating genome browser files, such as bedToBigBed (https://www.encodeproject.org/software/bedToBigBed/) and igvtools (https://www.broadinstitute.org/igv/igvtools).

Several bioinformatics tools were used to interpret biological meaning from epigenomic data results. For example, EpiExplorer (Halachev et al. 2012) empowers biologists to explore large epigenome datasets in real time and over the Internet. It facilitates interactive hypothesis generation and identification of candidates for experimental follow-up. Cytoscape (Shannon et al. 2003) is an open source software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. DAVID (da Huang et al. 2009) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind large list of genes.

10.3 Epigenomics of Human Diseases

With the advent of new technologies we are starting to unravel the epigenomic mechanisms underlying a diverse range of human disorders, such as cancer and autoimmune diseases. A comprehensive understanding of epigenetic mechanisms, their interactions and alterations in human disease, has become a priority in clinical research (Portela and Esteller 2010).

10.3.1 Epigenomics of Cancer

Diverse altered DNA methylation patterns have been implicated in the pathogenesis and metastasis of various cancers. Genome-wide hypomethylation has been revealed in several common cancer types, such as stomach, liver and lung cancers (Kulis and Esteller 2010) DNA hypomethylation occurs mostly at DNA-repetitive regions resulting in activation of genes with growth and tumor promoting functions and loss of genome stability and imprinting (Esteller 2008). A clear case is the long interspersed nuclear element (LINE) family member LINE1. Many studies have support correlations between LINE1 hypomethylation and increased risk of cancer (Barchitta et al. 2014). For example, hypomethylation of a specific LINE-1 promoter was found to induce an alternate transcript of the MET oncogene in bladder tumors and across the entire urothelium of tumor-bearing bladders (Wolff et al. 2010). A high degree of LINE-1 hypomethylation is a unique feature of early-onset colorectal cancer (Antelo et al. 2012; Ogino et al. 2013), and hypomethylation of LINE-1 in primary tumor has been associated with poor prognosis and survival in young breast cancer patients (van Hoesel et al. 2012) and prominent hypomethylation of Alu and LINE-1 in HER2 enriched subtype may be related to chromosomal instability (Park et al. 2014). In addition to the effects on repetitive elements, promoter hypomethylation can activate the aberrant expression of oncogenes and result in loss of imprinting in some loci (Portela and Esteller 2010). For instance, loss of imprinting of IGF2 gene has been associated with an increased risk of different types of cancer (Lim and Maher 2010). Recent study also shows that hypomethylation in TP73 and TERT gene body alter the transcriptional landscape of growth rate of glioblastoma through the activation of a limited number of normally silenced promoters within gene bodies, result in activating the aberrant expression of an oncogenic protein (Nagarajan et al. 2014). Hypermethylation at the CGIs of certain promoters causing transcriptional silencing of tumor suppressor gene were also observed. The transcriptional silencing caused by promoter hypermethylation affects genes involved in the multiple cellular pathways (Portela and Esteller 2010), such as DNA repair (e.g., MGMT, MLH1, MSH2, GSTP1), Ras signaling (e.g., DAPK, NOREIA, RASSFIA, RECK) etc. (Esteller 2007). For example, hypermethylation at CGI of MLH1 gene is reported in the majority of sporadic primary colorectal cancers with microsatellite instability, and that this methylation was often associated with loss of MLH1 protein expression (Herman et al. 1998).

Another epigenomic hallmark of cancer is the aberrant patterns of histone modifications. Epigenome-wide studies have characterized the overall profiles of various histone modification marks in cancer cells. For example, there is a global loss in H4K16ac in nearly all human cancer cell lines (Fraga et al. 2005b). Loss of acetylation is mediated by HDACs, which have been found to be overexpressed (Zhu et al. 2004) or mutated (Ropero et al. 2006) in different tumor types. Two different studies reported that global levels of H4K12ac and H3K18ac increased in adenocarcinomas in respect to normal tissue or adenoma (Ashktorab et al. 2009; Nakazawa et al. 2012). Cancer cells also bear global alterations of several histone methylation marks, such as a global loss of the active mark H3K4me3 (Hamamoto et al. 2004), and the repressive mark H4K20me3 (Fraga et al. 2005b), as well as a gain in the repressive marks H3K9me (Kondo et al. 2007) and H3K27me3 (Vire et al. 2006; Muller-Tidow et al. 2010).

Alterations of histone methylation marks in cancer cells are mainly due to the aberrant expression of both HMTs and histone demethylases (Chi et al. 2010). Gillian et al. reported the inactivating mutations in two genes encoding enzymes involved in histone modification: SETD2 gene (H3K36 methyltransferase) and JARID1C genes (H3K4 demethylase) in renal carcinomas (Dalgliesh et al. 2010). EZH2 gene (H3K27 methyltransferase) was reported overexpressed in several cancer types and enhances proliferation and neoplastic transformation (Kleer et al. 2003; Raman et al. 2005; Rhodes et al. 2004). NSD1, another HMT (H3K36 and H4K20), has been reported to undergo promoter DNA methylation-dependent silencing in neuroblastomas (Berdasco et al. 2009). H3K79 methyltransferase DOT1L is essential for development and maintenance the mixed lineage leukaemia. The presence of DOT1L results in H3K79 hypermethylation, which induces aberrant gene expression and contributes to leukemic transformation (Okada et al. 2006).

10.3.2 Epigenomics of Autoimmune Diseases

DNA methylation alteration has been increasingly associated with several autoimmune diseases in recent years; for which most studies focus on systemic autoimmune rheumatic diseases like systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA). SLE is characterized by autoantibody response to nuclear and/or cytoplasmic antigens. Several studies have shown that there is a global hypomethylation of promoter regions, which drive the genes that are overexpressed in the disease such as PRF1, CD70, CD154, IFGNR2, MMP14, LCN2, CSF3R and AIM2 genes, and also in the ribosomal RNA gene promoter (18S and 28S) (Portela and Esteller 2010; Ballestar 2011). This global loss of methylation has been attributed to induce the activation of endogenous retroviruses such that they erase imprinting signals and deregulate gene expression and consequently break immune tolerance for active flaring of the disease (Okada et al. 2002). The hypomethylation in SLE may be partially mediated by miR-21 and miR-148a that directly and indirectly target DNMT1 (Pan et al. 2010; Zhu et al. 2011). RA is a chronic inflammatory disease that largely affects peripheral joints by invasive synovial fibroblasts. Global changes in DNA methylation measured in fibroblast like synoviocytes showed distinct methylation profiles of RA patients, particularly in genes with key roles in inflammation, immune responses and matrix deconvolution. Hypomethylated loci were identified in key genes relevant to RA, such as CHI3L1, CASP1, STAT3, MAP3K5, MEFV and WISP3. Hypermethylation was also observed at some RA related genes, including TGFBR2 and FOXO1. Differentially methylated genes could alter fibroblast like synoviocytes gene expression and contribute to the pathogenesis of RA. Histone modification studies in human autoimmune diseases have found that during apoptosis, histones can be modified to make them immunogenic. Hypoacetylated histones H3 and H4 and H3K9 hypomethylation in CD4+ T cells were found to be a characteristic feature of SLE patients (Hu et al. 2008). In RA, the reduced activity of HDACs plays a key role in regulating NF-κB–mediated gene expression (Huber et al. 2007).

10.4 Discussion and Perspectives

Advances in technological development have enabled epigenomic analysis on a large scale. Remarkably, several international projects and consortia (Table 10.2) have been formed to comprehensively characterize epigenome-wide DNA methylation, histone modification, and other epigenetic profiles in healthy and disease tissues, such as the Encyclopedia of DNA Elements (ENCODE) Project (Consortium 2012), the Cancer Genome Atlas (TCGA) (TCGA. The Cancer Genome Atlas. http://cancergenome.nih.gov/) and the NIH Roadmap Epigenomics Project (The NIH Roadmap Epigenomics Project, http://www.epigenomebrowser.org/).

Table 10.2 Large-scale national and international epigenomic consortia

Although the number of epigenomic studies has grown exponentially in recent years, several issues need to be carefully considered when planning and interpretation of such studies. First, disease-associated epigenetic variation is likely to be cell-/tissue-specific. For studies using heterogeneous cell/tissue samples (e. g. blood, tumor), detection of differential DNA methylation or histone modification profiles is a problem of validity: molecular profile variation and changes in cell type proportions between tissue samples are confounded (Jacobsen et al. 2006; Jaffe and Irizarry 2014). If the disease-associated variation is restricted to a certain cell type that represents only a small proportion of the tissue sampled, then the variation may not be detected in the whole tissue (Jaffe and Irizarry 2014). Purified samples consisting only of a single cell type are preferable to mixed cell samples. Second, the complex system of the human body has many research areas, including genomics, epigenomics, transcriptomics, proteomics and metabolomics. Each research area provides insight into the system, but the entire complex of “omics” research offers more comprehensive insights. As costs of analysis of a human genome have dramatically plummeted, data integration is now a very commonly used notion. Integration between different epigenetic mechanisms and with other omics disciplines becomes easier and necessary for clinical research. For clinicians with access to omics data, being able to understand and appropriate interpret the data will become a key requirement for patient care. Along with the recent advancement in epigenetic drugs, there is a great potential for personalized epigenetic treatment of many human diseases in the near future.