Keywords

1 Introduction

The term ‘epigenetic landscape’ was coined by Conrad Waddington and defined epigenetics as the synergistic interaction of genes with their products that bring out the phenotype into being. Epigenetics is the study of mitotically and meiotically heritable changes in gene function that does not bring out change in DNA sequence.

Genetically, it has been viewed for long that DNA expresses through genes that help us to survive, reproduce and develop, and it is all set in stone, i.e. this behaviour is settled in stone and the environment. Traditional genetics fails to explain many of our questions related to differential gene expressions in multicellular organisms.

The vertical flow of information, stored in DNA, is in general answered by classical genetics and genomics, but they fail to explain the control of horizontal flow of information within a cell, which has given rise to division of labour at multicellularity level, throughout the evolution of life. Epigenetics, as a name suggests, goes beyond the classical genetics and deals with the study about which genetic element is to be expressed along with where and when it is to be expressed. These questions are answered by the structural features of the genetic element and not by the sequence of the DNA as the later deals with what is to be expressed. To cut whole story short, we can say that it is epigenome – the real boss of DNA deciding its functionality. To detail the concept of epigenetics, we can take the example of identical twins. The identical twins have the characteristics like eye colour, hair colour, etc. depicted by the DNA sequences within their genome, and this is what we called genetics, but it is not like that way. Genetically identical twins can be very different; one might be normal while other may be suffering from any disease. This can’t be explained on the basis of genetics because the DNA is identical in both the twins. To look beyond and understand the regulation of gene expression that is not directly encoded in the DNA sequences is epigenetics (Bock and Lengauer 2007). In other words, it is the chemical modifications in a gene that is heritable and influences the activity of DNA but lies outside of genomic sequences and does not alter the gene expression. It actually muddles the nature and the nurture.

2 Epigenetic Machinery

Epigenetic mechanism involves a number of processes that are critical for diversity of cell type during embryogenesis and gametogenesis and also crucial for tissue-specific gene expression and global gene silencing. Crucial developmental process from fertilization till the end of gastrulation phase can be altered and may result in structural abnormalities (Piplani et al. 2016). Epigenetic mechanisms encompass the mechanism that includes DNA and the nucleosomes and are known to be the carrier of this process (Espada and Esteller 2007; Rivera and Ren 2013). The three most common chemical and biological modifications that alter the genetic expressions are chromatin structure and modification, DNA methylation and miRNA.

2.1 DNA Methylation

DNA methylation is a crucial mechanism for cell differentiation and organism development. It alters the gene expression through methylation of DNA strand itself. This biochemical process involves the cytosine bases of eukaryotic DNA being converted into 5-methylcytosine and thus represses the transcription. This typically occurs at CpG island (Jones 2012; Virani et al. 2012).

This process thus passes on from generation to generation by the process of cell division. During zygote formation, DNA methylation is removed and re-established during cell divisions. The DNA methyltransferase (DNMT) enzymes are responsible for maintaining and establishing this unique pattern on CpG island (Dodge et al. 2002). Till date five DNMTs have been reported that are involved in catalysing this process (Roberts et al. 2003; Serman et al. 2006; Putiri and Robertson 2011) as summarized in Table 5.1.

Table 5.1 DNA methylating enzymes

2.2 Histone Modifications

These are another most common and complex epigenetic phenomenon that includes acetylation, phosphorylation, sumoylation, methylation and ubiquitination (Bannister and Kouzarides 2011). Histones are conserved proteins and can be modified at amino acid residues on N- and C- terminals. The C-terminus forms globular domains packed in core, while the N-terminal is comparatively flexible and interacts directly with DNA and other proteins within nucleus and is important to maintain chromatin stability. Unlike the rest of the modifications, the methylation and acetylation on arginine and lysine residues repress the transcription. These modifications are achieved by different histone-modifying enzymes as summarized in Table 5.2. The histone modifications alter the chromosome structure by altering the electrostatic properties or by producing binding affinities for protein recognition module (Iizuka and Smith 2003; Brooks and Shi 2014).

Table 5.2 Histone modification enzymes

2.3 miRNA

miRNAs are ~22 nucleotide long non-coding RNAs that negatively control gene expression post transcriptionally. miRNAs are highly conserved in plant and animal species and play critical roles in a variety of biological processes including pattern formation and developmental timing, cell signalling, carcinogenesis, etc. (Sato et al. 2011). About 50% of miRNA genes are housed in the fragile genome regions and are very sensitive to deletion, translocation or duplication. miRNAs have already been looked upon as new therapeutic agents and/or targets for different diseases (Chuang and Jones 2007).

2.4 Binding of Nonhistone Proteins

The proteins in chromatin which remain even after the removal of histones are commonly known as nonhistone proteins. These proteins interact with the chromosome structure and remodel the chromatin structure, thus regulating the silencing of genes. The two nonhistone proteins, polycomb and trithorax, are epigenetic regulators and affect gene expressions. Polycomb protein induces gene silencing, while the trithorax protein induces gene inactivation (Pullirsch et al. 2010).

2.5 Environmental Factors

Various dietary factors for mothers and environmental factors have been linked to epigenetic modifications. The conditions like stress, nutrition and environment deal with how our DNA behaves in present and next generation(s). Epigenetic mechanism regulates throughout the life. The nutrition of the mother affects the foetus; stress hormone also travels from mother to foetus. Social interactions, physical activities, exposure to toxins and diet are also the major factors shaping the epigenome (Kubota et al. 2012). The living place, consumption of alcohols and exposure to various drugs are potentially able to alter the changes in epigenetic status (Aguilera et al. 2010; Choi and Friso 2010). The external factors influencing epigenetic traits are depicted in Fig. 5.1.

Fig. 5.1
figure 1

External factors influencing epigenetic traits

3 Analysis of Epigenetic Modifications

3.1 In Vitro Methods

Various techniques are in extensive use for the interpretation of epigenetic data. Bisulphite sequencing method is used to determine the DNA methylation. This technique uses methylation restrictive enzymes that convert DNA with bisulphite to 5-methylcytosine (Jones and Takai 2001). Another commonly used method is ‘chromatin immunoprecipitation’ that determines the DNA binding sites on the genome for a particular protein (Collas 2010). It also predicts the DNA protein interactions, either inside the nucleus or within the living cell. The last decade witnessed rapid development of microarray-based technology. It analyses the bisulphite-treated DNA methylation sites (Schumacher et al. 2006) and utilizes the pairs of oligonucleotide hybridization probes for targeting the CpG sites.

3.2 In Silico Methods

With the developing technologies and advancement in the field of computational biology, several techniques have now developed to focus on identification of epigenetic modifications, such as Support Vector Machine (SVM), artificial neural networks (ANNs), hidden Markov model (HMM) and clustering analysis. SVM is used to diagnose the cytosine methylation in CpG nucleotides (Bhasin et al. 2005; Robinson et al. 2014). It is a successful machine learning technique to evaluate the pattern, but the problem is the lack of experimentally available public data. ANN performs the prediction algorithm for the human-specific methylation sites (de Pretis and Pelizzola 2014). This method is designed to mimic the architecture of the brain. The data that is needed to be evaluated is differentiated into processing units called neurons. These neurons process the data using a variety of mathematical evaluations (Marchevsky et al. 2004). Another widely used computational technique is HMM which is used to detect the CpG islands (Robinson and Pelizzola 2015). This method is deliberately used for the sequence analysis (Wu et al. 2010). Clustering analysis helps in reflecting the true distribution of gene space (Fazzari and Greally 2004). Many reported attempts have been successfully made to analyse epigenetic modifications using in silico methods (Fig. 5.2) (Gitan et al. 2002; Collins et al. 2003; Laird 2003; Yan et al. 2004; Meissner et al. 2005; Pfister et al. 2007; Petrossian and Clarke 2009).

Fig. 5.2
figure 2

Integration of technology to generate epigenome database

4 Epigenomics: The Computational View of Epigenetics

Post-‘Human Genome Project’ era has witnessed a tsunami of epigenetic data due to rapid development in technological applications in biological sciences. Bioinformatics has evolved at a parallel rate to support, store, manage, manipulate and exploit the biological data flood. As a result, a new jargon has been added to bioinformatics vocabulary, namely, ‘epigenomics’. It deals in the computational analysis of experimentally determined epigenetic data and is emerging as a separate but prominent frontier of biological sciences. It involves in silico collection of data, maintaining databases and analysing the stored data in a computationally intensive manner, related to alterations of genetic expressions and gene activity independent of gene sequence. Epigenetics deals in stable alterations in genetic expressions and gene activity independent of gene sequence, both in the same generation (horizontal flow) and long-term changes in transcriptional potential of a cell that are generally not heritable. Epigenomics differs from epigenetics in emphasizing on global analyses of sequence-independent genetic changes throughout the genome, while the latter looks for studying the same for a gene or a gene set (Tammen et al. 2013). There have been many epigenomic efforts being carried out worldwide of medium/large scale; a major few are as listed in Table 5.3.

Table 5.3 Large- and medium-scale epigenomic efforts

Various bioinformatics methods help in identifying gene expression by studying the contribution of DNA methylation, identifying the CpG islands and studying the in silico modelling and dynamics of epigenetic processes. Above all, the most important is the epigenomic data collection for solving the mystery. Algorithms like BLAST (Altschul et al. 1990), BLAT (Kent 2002) and Clustal W (Thompson et al. 1994) allow for the sequence analysis for inference of functional, structural and evolutionary relationships. The scientific literature database like PubMed and molecular databases like DDBJ, EMBL and GenBANK serves as repositories for nucleotide and protein sequence of different species. The databases, like GEO, GENSAT, StemBase, etc., allow for the identification of dynamic changes in gene expression in different cell types. The epigenomic data is getting enriched every day; a few important resources are summarized in Table 5.4.

Table 5.4 Information resources for epigenomics

5 Recent Insights into Disease Epigenomics

The epigenomics supersedes epigenetic studies in the manner that the former has comprehensive approach as compared to single-gene association to a disease in epigenetics (Feinberg 2010). Candidate gene-disease association studies have now been replaced by whole-genome disease association studies and are accepted with more scientific appreciations (Lieberman-Aiden et al. 2009). Methylation of cytosine residues of DNA has been attributed as the central stage player in epigenomic modification and plays a pivotal role in cellular processes including regulation of genes, organism development and disease (Lister et al. 2009). The recent studies on different cancer-associated genes and its epigenetic machinery have revealed the requisites of epigenomic correlation search for unexplained diseases (Frigola et al. 2006; Irizarry et al. 2009; Fraga et al. 2005a, b; Lister et al. 2009; Doi et al. 2009).

Gene-silencing events were observed to be spanning through a large regions of genome in colorectal cancer. The DNA-methylated and adjoining un-methylated genes were also observed to be coordinately suppressed across the entire chromosome (Frigola et al. 2006). Irizarry et al. (2009) showed that in colon cancer, the epigenetic alterations due to DNA methylation occur in promoters and CpG islands, extending to sequences up to 2Kb distance, termed as ‘CpG island shores’. With CpG island hypermethylation, the global hypomethylation across genome has also been a common epigenomic character of cancerous cells (Irizarry et al. 2009). Tumour cells in humans have been attributed with the hallmark of mono-acetylation loss and tri-methylation of H4 histone proteins at global genomic level (Fraga et al. 2005a). A widespread epigenomic difference has been attributed responsible for differential susceptibility to diseases and other anthropomorphic variations observed in the lifetime of monozygotic twins (Fraga et al. 2005b).

While mapping whole genome at single-base resolution, significant differences were observed in pattern and composition of cytosine methylation of embryonic and differentiated cells (Lister et al. 2009). About one-fourth part of total genomic methylations of embryonic stem cells have been identified in a non-CG context reflecting to other methylation mode to regulate gene. Higher degree of non-CG methylations were observed in gene-coding areas, while the same were observed to be depleted in binding sites of proteins and regulatory regions (Lister et al. 2009). There has been suggestion about different mechanisms of epigenetic reprogramming resulting in differential methylation of tissue and disease-specific CpG islands in differentiated cells (fibroblasts), pluripotent stem cells and embryonic stem cells (Doi et al. 2009).

The role of epigenetic mechanism in altered embryonic developments and its manifestations in adulthood diseases, like cardiovascular diseases, is still unknown, but there is a growing volume of evidences supporting the epigenetic regulations (Martinez et al. 2015). Since the early 1980s, several studies have indicated the possible involvement of adverse intrauterine condition to adulthood diseases like cardiovascular diseases (Barker and Osmond 1988), diabetes (type 2), metabolic syndrome, ischemic heart disease, hypertension, etc. (Chen and Zhang 2011). These studies formed the basis of ‘thrifty phenotype hypothesis’ (Hales and Baker 2001) stating the abnormal intrauterine situations like toxins, hypoxia, undernutrition, chemicals, etc., pushes the developing embryo to undergo irreversible changes to adapt and survive the suboptimal environment, resulting in increased susceptibility of neonate or adult towards developing a disease.

Since the completion of ‘Human Genome Project’ in the early twenty-first century and next-generation sequencing technology, there has been a flood of genomic and epigenomic data. These developments have resulted in a paradigm shift in our concepts of cell and cellular function. The idea of reprogramming a cell in form of stem cell development has triggered a race for understanding the cellular function beyond genomics, i.e. epigenomics. Several diseases have been and will be associated with epigenetic and epigenomic basis and needs to exploit more to develop therapeutic applications of the same (Kanherkar et al. 2014). After the development of induced pluripotent stem cells (Park et al. 2012), there have been several successful attempts to modulate them epigenetically for generating differential potentials. These have led to many device novel therapies and develop novel disease models for neurodegenerative disorders, cardiovascular diseases, metabolic disorders (PCOS), etc. (Huang and Wu 2013).

6 Conclusion

Epigenetic modifications provide a link between nature and nurture. The epigenomics paves the way through a new research that is valuable for predicting and diagnosing various diseases. Epigenomics has the potential to overthrow the disease-causing genes and identify the altered gene expression. Recent research by American Association for Cancer Research (AACR) offers the epigenetic therapy of killing cancer cells by methylating without disrupting its pathway (Jones 2012). The epigenetic tags and their response make it a valuable technology. Various other stocks are in queue for targeting different tissues affected by the disease. The epigenetic data adds a flavour to both the computational- and wet-lab work, and we can say ‘epigenomics: an era beckoning’.

7 Future Directions

The initial phase of computational epigenetic research got impetus from the ever-progressing experimental ways of data generation. The high-throughput epigenomic data thus generated need proper computational analyses for low-level data processing and quality control. This has led to epigenome predictions as a way out which understand the epigenetic information distributed throughout genome. In conclusion, exciting times are ahead for research in epigenetics with high computational input requirement.