Keywords

1 The History and Discovery of the Basic Mechanics of Epigenetics

1.1 Experimental Observations: Setting the Stage for the Discovery of Epigenetic Mechanisms

To set the stage for our later discussion of the epigenetic underpinnings of disease pathogenesis, we will first review the history of the discovery of epigenetic control mechanisms and the field of epigenetics more generally. Epigenetics in its contemporary form, the focus of this book, has only emerged since the 1990s; in fact, the consensus modern definition, “stably heritable phenotype resulting from changes in a chromosome without alterations in the DNA sequence” was formulated at a Cold Spring Harbor meeting in 2008 (Berger et al. 2009). Prior to this, the term “epigenetic” was used with widely disparate meanings. The word itself has been in English language use since the seventeenth century, having been used by Harvey in Exercitationes (1651) and Anatomical Exercitations (1653), with the definition “the additament of parts budding one out of another.” The subsequent history of epigenetics as we presently understand it is inevitably linked with the history of genetics, itself emerging out of historical analyses of evolution, development, and inheritance.

No discussion of the history of epigenetics is complete without reference to the theories of the French zoologist Jean-Baptiste Lamarck (Fig. 2.1). At half-past noon on the May 17, 1802, he gave the first lecture of his course on invertebrate zoology at the Museum of Natural History in Paris, where he set forth an unprecedented (and later debunked) theory explaining the incremental development and differentiation of all life forms on earth. His theories, now known simply as “Lamarckian evolution”, were based on two major themes. First, he posited that the environment gives rise to changes in animals; he offered as examples the presence of teeth in mammals, the absence of teeth in birds, and the loss of sight in moles. The second principle argued that life was structured in an orderly manner, and that different parts of all bodies made possible the organic movement of animals (Eb 1895). Lamarck’s ideas on evolution are held as grand examples of theories of inheritance of acquired characteristics, also known as “soft inheritance”, later supplanted by the infamous descriptions of inheritance and natural selection put forth by Gregor Mendel and Charles Darwin, respectively, in the mid-1800s. Although discredited as a viable theory of evolution throughout the twentieth century, Lamarck’s underlying ideas regarding environmental contributions to phenotypic characteristics (and the heritability thereof) have seen a resurgence in modern times with recent descriptions of transgenerational epigenetic inheritance.

Fig. 2.1
figure 1

Credit Wellcome Library, London, Wellcome Images

Jean-Baptiste Lamarck.

The first concepts we would recognize as related to modern epigenetics were formulated in the late nineteenth century with Fleming’s discovery of chromosomes and their behavior during animal cell division in 1879. Subsequent work by a variety of investigators cemented the notion of chromosomes as the quintessential substrate of inherited information. Most convincingly, in 1911 Thomas Hunt Morgan demonstrated sex–chromosome genetic linkage of several Drosophila genes (Morgan 1911), which was quickly followed by the first maps of individual genes arranged in a linear fashion occupying specific locations on each chromosome (Sturtevant 1913). Questions remained, however, regarding the ways in which this “hard” genetic information was used to direct the developmental programs which lead to cellular differentiation and the diversity of phenotypes seen in multicellular life forms. It quickly became apparent that chromosomal nucleic acid codes alone were inadequate to carry all developmental information, since similar amounts and organization of this material was present in widely disparate cell types.

As an example, Hermann Joseph Muller in 1930 described a class of Drosophila mutations which involved the displacement of large portions of genetic material from its rightful place to another chromosome (a phenomenon we would later understand as balanced translocations) (Muller 1930) resulting in unexpected phenotypic variations; particularly, the observation of mottled eyes. He initially attributed this to “genetic diversity of the different eye-forming cells,” but later research including Hannah et al. in the 1950s went on to demonstrate that this variegation arose when rearrangements inserted particular genes into regions of the chromosome with variable staining density. These regions would come to be known as heterochromatin, which we now understand to be differentiated from evenly staining chromosomal regions (euchromatin) by the density of DNA-packaging elements, directed by epigenetic modification (Hannah 1951). It was about this time (1939) when Conrad Waddington, the Buchanan Professor of Genetics at Edinburgh University, altered the Greek word epigenesis, defined as a theory of development in which the early embryo existed in an undifferentiated state, to epigenetics, which he defined as the implementation of cellular genetic programming for development (Waddington 1939). This view was reflected in his later book The Epigenetics of Birds, which provides an account of the embryological development of chicks (Waddington 1952). He is also credited with the term epigenotype, which he defined as “the total developmental system consisting of interrelated developmental pathways through which the adult form of an organism is realized” (Waddington 1939). Although admittedly broad, this definition strikes a familiar chord when applied to our modern understanding of epigenetics, where, at least at a cellular level, epigenetic controls may indeed be well defined as the set of interrelated transcriptional control programs which dictate and ensure the proper terminal differentiation of cells in a particular tissue.

Simultaneous to and along the same lines as Waddington, seminal works were published establishing and refining the ideas behind phenotypic divergence during organism development at a cellular level. It became evident, for example, that phenotypically distinguishing features, once established, were clonally inherited to daughter cells. The mechanisms underlying these observations were unclear. Initially, it was proposed that these phenotypes were maintained by the presence or accumulation of particular biochemical reactions, suggested by the physicist Max Delbruck in 1949. In this model, self-stabilizing stimulatory and inhibitory mechanisms could theoretically lead to substantial phenotypic divergence, which could be passed on from parent to daughter cell. This theory ran into a problem, however, with the discovery that protein-free DNA could carry this information, first by Avery et al. in 1944 (the “Transforming Principle”) (Avery et al. 1944), later by Hershey and Chase in 1952 (Hershey and Chase 1952), and further reinforced with the infamous description of the structure of DNA by Crick at Cold Spring Harbor in 1953 (Watson and Crick 1953). At the same time, a pivotal study in northern leopard frogs by Robert Briggs and Thomas King was published (Briggs and King 1952) in which they demonstrated that a nuclear transplant from a frog blastula into an enucleated frog egg cell results in the normal development of an embryo, which argued that the “essential material” for complex organism development was indeed contained in the nucleus. Although discoveries since that time have cemented the concept of a fully differentiated cell having the same genetic makeup as an embryonic cell, this was not self-evident until as late as 1970. It was in this year that a pivotal study done by Laskey and Gurdon demonstrated the transfer of nuclear material from several adult somatic Xenopus frog cell sources (around 1–2% of transplanted nuclei, likely representing endogenous tissue pluripotent cells) into enucleated eggs resulted in embryos that developed into feeding larval stages (Laskey and Gurdon 1970). Other work soon revealed that cellular fates and phenotypic characteristics were heritable throughout cellular differentiation. For example, Ernst Hardon in 1965 showed that imaginal disc cells from Drosophila (regions of embryonic tissue present in fly larvae that subsequently develop into well-defined adult structures; i.e., two per wing, etc.), were still able to develop into their requisite structures on transplantation into fly larvae, even after being passaged through multiple generations of adult flies (Hadorn 1965).

1.2 Getting to the Substance of the Matter: Discoveries of Physical Mechanisms of Epigenetic Control

Building on prior observations, a number of scientists made key discoveries in the mid-twentieth century that would help explain how these cellularly heritable gene expression patterns were made possible. First were investigations into the mechanisms underlying X-chromosome inactivation, the process by which one of two X chromosomes in females is randomly selected to more or less fully transcriptionally “shut down”. Ohno demonstrated in 1959 that the “sex chromatin”, a female-specific chromosome first discovered by Barr and Bertram (1949), and still commonly bearing the eponymous title Barr Body, was, in fact, an X chromosome which had been densely packaged (Ohno et al. 1959). Lyon and colleagues in 1961 went on to demonstrate that the selection of X chromosome for condensation into a Barr Body from the paternal or maternal genetic material was a random process, providing further evidence that the mechanism of inactivation of genetic material was the most important component, not any preexisting underlying difference in genetic code (Lyon 1961).

DNA-binding proteins, histones chief among them, had long been recognized as associated with nucleic acids, having first been described by Kossel (1884), which, incidentally, earned him the 1910 Nobel Prize in Physiology or Medicine. The husband-and-wife research team Steadman and Steadman in 1950 described variations in histone proteins isolated from different tissues, initially incorrectly ascribing these differences to variation in their constituent amino acids, but nonetheless correctly surmising the importance of what would later be defined as histone posttranslational modifications in contributing to the regulation of gene transcription (Stedman and Stedman 1950). In their letter to Nature in 1950, they stated:

It has always been a puzzle to us, how the physiological functions of the cell nuclei in the same organism can differ from one cell-type to another when they all contain identical chromosomes and hence identical genes. …The demonstration…that some of the basic proteins present in cell nuclei are certainly cell-specific leads to the hypothesis that one of their physiological functions is to act as gene suppressors. (Steadman and Steadman 1950)

Soon though, the field of gene transcription shifted toward cis-acting regulatory elements and transcription factors (see below) It was thereafter generally assumed that histones were passively acting packaging proteins that served only to suppress gene transcription, even though evidence existed that extensive genomic regions with an open chromatin conformation did not, in fact, exist (Clark and Felsenfeld 1971). Further disagreement came with the description of posttranslational modifications of histone proteins were in 1964 by Mirsky and Allfrey. Importantly, they demonstrated that acetylated histones were associated with more active gene transcription than their non-acetylated counterparts (Allfrey et al. 1964), suggesting that certain histones were required for gene activation, rather than a straightforward removal of histone proteins. Several additional posttranslational modifications of proteins were discovered in the ensuing years, including methylation and phosphorylation, although their associated cellular functions remained elusive. It would not be until the mid-1990 s that the binding of transcriptional regulators to post-translationally modified histone proteins would elevate their status to bona fide epigenetic modifications (Hecht et al. 1995). Among the seminal works at this time, Taunton and colleagues in 1996 demonstrated that a mammalian histone deacetylase was closely related to a protein previously described as a transcriptional repressor in yeast (Taunton et al. 1996). Furthermore, Brownell et al. in 1996 showed that a histone acetyltransferase from the ciliate protozoan Tetrahymena was homologous to the yeast regulatory protein Gcn5 (Brownell et al. 1996), proving definitively that histone post-translational modification enzymatic activity was correlated with gene transcription regulation.

The regulatory potential of modifications of the DNA molecule itself was put forth by Holliday and Pugh (1975) and Riggs (1975) in 1975, who proposed (correctly, in retrospect) that methylation of DNA nucleotides could at least partially account for Barr Body chromosomal inactivation. It is important to note here that methylation of cytosine DNA residues had been previously reported in the literature more than 20 years earlier in 1951, by Wyatt et al., although the function of the modification was unknown at the time (Wyatt 1951). In a remarkably prescient description of what would later come to be identified as an epigenetic control mechanism, Holliday and Pugh wrote:

Since the ultimate control of development reside in the genetic material, the actual program must be written in base sequences in the DNA. It is also clear that cytoplasmic components can have a powerful or overriding influence on genomic activity in particular cells, yet these cytoplasmic components are, of course, usually derived from the activity of genes at some earlier stage of development. A continual interaction between cytoplasmic enzymes and DNA sequences is an essential part of the model to be presented. (Holliday and Pugh 1975)

They further correctly hypothesized that the chemical modification of DNA by methylation at cytosine residues in cytosine–guanosine (CpG) pairs is added by both a set of de novo enzymes (the DNA methyltransferases DNMT3a and DNMT3b) and a separate maintenance enzyme required to copy epigenetic marks to daughter cells during division (the DNA methyltransferase DNMT1), first described some 20 years later in 1982 and 1983, respectively (Jähner et al. 1982; Bestor and Ingram 1983). Furthermore, they also correctly pointed out that these modifications are reversible, a necessary function for any organism to react in a bidirectional fashion to environmental manipulation, although the enzymes responsible for DNA demethylation were not described and validated until some three decades later (Tahiliani et al. 2009; Ito et al. 2010; Ko et al. 2010). These observations gave the first depictions of sequence-specific epigenetic mechanisms by which transcriptional patterns could be set and transmitted through cellular generations.

It is here that we first encounter evidence of the nascent field of epigenetics intersecting with studies of human disease pathogenesis. In his 1979 paper entitled “A New Theory of Carcinogenesis”, Holliday argued that repair of damage to DNA induced by known carcinogens could lead to loss of epigenetic information; specifically, that “replacement” DNA would lose any preexisting methylation marks and, if in the right gene, might partially explain tumorigenesis (Holliday 1979). Although not entirely accurate, an epigenetic basis for malignancy, and epigenetic aberrations within known oncogenes and near chronic-disease-associated susceptibility alleles is now a well-studied phenomenon. There were two lines of evidence then uncovered that unequivocally linked methylation of cytosine DNA residues to gene expression and confirmed previous hunches about DNA methylation and X-chromosome inactivation. The first was the discovery of bacterial restriction enzyme pairs (known as isoschizomers) where one enzyme was methylation-sensitive; that is, one enzyme which would cut double-stranded DNA at a particular position regardless of the methylation status of an included cytosine, whereas another would cut at the same sequence only when cytosine residues were in a single methylation state. An example of this can be seen in the pair MspI and HpaII; both cut GCGC motifs, but Msp I requires the middle cytosine to be methylated. Several studies in the late 1970s and early 1980s soon leveraged this technique to demonstrate that genes with DNA methylation of their respective promoter regions were inactive, whereas unmethylated promoter regions were associated with active gene expression (Doerfler 1981). The second line of evidence came soon thereafter, with the discovery that the nucleoside analogue 5-azacytidine. When incorporated into cellular DNA, it inhibits the activity of DNA methyltransferases, leading to reductions in DNA methylation levels globally and subsequent increases in gene expression levels, including reactivation of genes located on the inactive X chromosome (Jones 1985).

An obvious question then became the focus of the epigenetics field: how are DNA methylation marks (or histone posttranslational modifications) targeted to the right genomic locations within a cell at the right time? This final fundamental topic in epigenetic regulation began to be answered in the late 1980 and 1990s, when intermediate factors that possess both DNA localization domains as well as binding sites for chromatin- or DNA methylation-altering enzymes were described. Of particular historical importance in this context included the description of the myogenic differentiation factor MyoD by Davis et al. (1987), Weintraub et al. (1989). MyoD is a transcription factor that binds to the enhancer regions of hundreds of myogenic genes and recruits various histone posttranslational modifiers, including p300, LSD1, G9a, and HDAC1; in this way, it functions as a “bridge” between short-term gene activation or repression and long-term epigenetic modification to produce a particular transcriptional phenotype driving myogenic differentiation. In 1994, Pazin and colleagues induced gene activation in Drosophila embryo chromatin extract by mixing with yeast activator GAL4, fused with the transcriptional activation region of the herpes virus protein VP16 (Pazin et al. 1994). They demonstrated that gene activation occurred via an ATP-dependent mechanism, and was accompanied by reorganization of nucleosomes within the chromatin. These findings would later be clarified with the discovery of nucleosome remodeling complexes, including SWI/SNF (Peterson and Herskowitz 1992) and NURF (Mizuguchi et al. 1997), which have subsequently been shown to bind themselves to other protein sequences (like Dri ARID) which provide sequence specificity to their gene transcription modulatory activity (Sun et al. 2015).

These discoveries have become particularly important over the past decade or two as investigators have begun identifying alterations in various epigenetic-targeting effectors in association with human disease. Take, for example, the description by Sawalha and colleagues in 2008 of genetic mutations in the methylcytosine-binding protein 2 (MECP2) gene which increase the risk of systemic lupus erythematosus (Sawalha et al. 2008); MECP2 functions to bind methylated DNA and direct further histone posttranslational modification to “turn off” genes, as well as recruiting the maintenance DNA methyltransferase DNMT1 during cellular division, maintaining fidelity of epigenetic marks. Koelsch, Sawalha, and colleagues went on to show that overexpression of MECP2 in a mouse model altered all three major domains of epigenetic control: DNA methylation, noncoding RNA (discussed in detail below), and histone posttranslational modification (Koelsch et al. 2013).

1.3 A New Epigenetic Role for RNA

We have until this point focused on DNA and protein as the central players in the epigenetic regulation space; however, we must not ignore the role of RNAs, which had a bit of a late start but have nonetheless more recently risen to epigenetics fame. Coincident with the discoveries of DNA in the mid-twentieth century, the “central dogma” of molecular biology emerged; that is, the concept that DNA encodes RNA which, in turn, is “read” by ribosomes to produce proteins. The first glimpse of a potential regulatory role of RNA was found in 1966, when Warner and colleagues identified a complex group of nuclear RNAs aside from what was then termed “ribosomal precursor” RNA, which they termed “heterogeneous nuclear RNA” or hnRNA (Warner et al. 1966). At the same time, researchers identified retrotransposons, large areas of repeated elements within DNA in multiple species across several domains. Integrating this information, Roy Britten and Eric Davidson published “Gene Regulation for Higher Cells: A Theory” in Science Magazine, which outlined their theory that higher organisms contained extensive RNA networks which regulated gene transcription (Britten and Davidson 1969). The next big discovery occurred in 1997 with the demonstration of noncoding regions interspersed among coding regions in messenger RNA, areas they termed “introns” (Berget et al. 1977; Chow et al. 1977). Unfortunately, it was erroneously assumed that the noncoding portions of these transcripts represented “junk” RNA, and little additional work was done on these noncoding RNA regions in the ensuing decade.

This thinking radically changed, however, with the landmark discovery in 1993 by Ambros and colleagues of the regulatory nature of small (~22 nucleotides in length) bits of RNA. Working in C. elegans, they found that the gene lin-4 encoded two small RNAs, one 22 and the other 61 nucleotides, which bound to a repeated sequence element in the 3′ untranslated region of lin-14 mRNA, which codes the LIN-14 protein, critical for C. elegans development. They further demonstrated that this antisense RNA–RNA interaction was tightly inversely correlated with LIN-14 expression, suggesting that this small RNA was indeed regulating the gene (Lee et al. 1993). The world of noncoding RNA grew much larger with the discovery in 1998 of the RNA interference pathway in both C. elegans (Fire et al. 1998) and plant species (Waterhouse et al. 1998), discoveries which would lead to the awarding of the Nobel Prize for Physiology or Medicine to Fire et al. (1998). This RNA interference pathway had been suggested in plants by the process of transgene silencing, a mechanism described in the early 90s which has been subsequently identified as RNA-directed DNA methylation (see the previous discussion of this epigenetic control) (van der Krol et al. 1990; Wassenegger et al. 1994). Scientists then identified the presence of double-stranded RNA, and the fact that these were processed into short fragments (small interfering RNA, or siRNA) upon viral infection of plant cells. Subsequently, endogenous dsRNA precursors were identified, as were the cellular proteins responsible for their processing (including Argonaute, Drosha, and Dicer) (Bernstein et al. 2001; Doi et al. 2003; Lee et al. 2003; Basyuk et al. 2003) in the early 2000s. Although it was initially thought that these noncoding RNAs functioned more or less exclusively by binding messenger RNA and targeting it for degradation (thereby downregulating gene expression), later studies demonstrated that certain siRNA processing machinery, particularly Argonaute, was expressed not only in the cytoplasm, but in the nucleus (Ahlenstiel et al. 2012; Ameyar-Zazoua et al. 2012). This curious observation led to later demonstrations that these short RNAs can direct other forms of epigenetic modulation within the nucleus; incidentally, by recruiting a variety of histone posttranslational-modification enzymes, including the aforementioned polycomb complex (Kim et al. 2006).

Over the past decade a large range of noncoding RNAs have been identified, all with distinct regulatory pathways, many of which involving direct epigenetic modulation of a target gene. For example, long noncoding RNAs are intricately linked with paternal imprinting (the process by which the redundant copies of each gene, either maternal or paternal in origin, are inactivated) (Sleutels et al. 2002; Thakur et al. 2004). Interestingly, X chromosome inactivation is also partially directed by the coating of the nascent chromosome in a short noncoding RNA known as the X-inactive specific transcript (X-ist), which, in turn, recruits Polycomb repressive complex 2 (PCRC2), which trimethylates lysine 27 of histone 3 (H3K27me3), a repressive histone mark. The expression of X-ist itself is epigenetically regulated by a variety of other lncRNAs (review, Lee and Bartolomei 2013). Several novel functions of miRNAs have been proposed, and several new classes of larger noncoding RNA with important transcriptional significance have been recently described, including Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and others.

2 Epigenetics Moves Mainstream and Meets the Clinic

Now that we have outlined a historical perspective of the discovery of epigenetics, we will next move to describe how our knowledge of these fundamental mechanisms have expanded and, in some cases, radically changed our understanding both of normal cellular processes and of disease pathogenesis. As an example, we will first explore the epigenetic changes underlying the normal aging process, then introduce the concept of an epigenetic “clock”, a topic that has received much attention in the recent literature. We will then discuss the epigenetic effects of antiaging interventions. Epigenetics has not only been studied in the context of disease pathogenesis however; especially from a clinical standpoint, epigenetic biomarkers hold great promise, and will be discussed. Finally, though this chapter focuses heavily on a historical view of epigenetics, we will then close with examples of how modern epigenetic, transcriptomic, and proteomic tools are being used along with modern computational approaches in the modern age to help make sense of the massive quantities of data being generated through increasingly detailed looks at each of these “-omics” domains.

2.1 The Epigenetics of Aging

We will start with the epigenetics of aging, a normal process that nonetheless contributes to the development of a large number of chronic diseases. As we have previously detailed, a robust and relatively error-free epigenetic regulatory system is required for the normal development of a multicellular organism, as tissue types arise from embryonic multipotent cells, and particular transcription patterns must be set epigenetically to allow for appropriate tissue differentiation throughout organism development and cellular replenishment or maintenance throughout life. One could imagine, then, that reductions of cellular “plasticity” caused by loss of or improper epigenetic regulation, could substantially affect many fundamental biological processes. There is now a large body of literature which supports the notion that epigenetic aberrancies occur as part of normal aging, and that these may predispose to the development of a variety of diseases. In this section, we will discuss a few of the general epigenetic modifications that have been associated with aging, and a relatively new epigenetic paradigm (the epigenetic clock) that may have important implications in our understanding of epigenetic aging and disease susceptibility.

Many of the epigenetic changes previously observed during aging are outlined in Table 2.1 and an overview of epigenetic changes in aging is presented in Fig. 2.2. A global example is generalized loss of histones, a feature which has been noted in organisms from yeast to humans (Dang et al. 2009; O’Sullivan et al. 2010). This loss of histones is associated with the total number of divisions a particular cell has undergone, similar to telomere shortening. The loss can be dramatic; for example, in aged yeast, ~50% of overall histone loss can be seen (Hu et al. 2014). Unsurprisingly, this histone loss renders DNA more accessible to transcription globally, and, in this example, led to the induction of all known yeast genes. Conversely, artificially increasing histone protein production results in extended lifespan in yeast (Freser 2010). Interestingly, during senescence, a pathogenic state in which cells cease to divide but continue to be metabolically active, and produce a variety of harmful inflammatory cytokines, histone genes undergo unique alternative splicing of their requisite genes to produce senescence-specific histone proteins (Rai et al. 2014).

Table 2.1 Epigenetic changes during aging. Adopted from Benayoun et al. (2015)
Fig. 2.2
figure 2

Adopted from Benayoun et al. (2015)

A model for epigenetic changes observed in aging.

Another epigenetically mediated mechanism underlying aging and age-related diseases are reductions in chromatin stability, which can be pathogenic in two fundamental ways. First, age-dependent decreases in chromatin stability, linked with reductions in histone proteins and appropriate chromatin packaging, can make the genome more susceptible to mutations. Additionally, less stable chromatin can reduce the precision of transcription. The second major mechanism whereby epigenetically mediated mechanisms can affect chromatin stability is by making the epigenetic state of chromatin itself unstable, a process known as epimutation (Chong et al. 2007). Much like mutations in underlying genomic material, epimutations are passed on to daughter cells and increases in epimutation rate are linked to the number of cellular divisions; therefore, epimutations increase throughout life, as first described by Holliday (1987). These are driven by a number of factors, the first being errors in DNA and epigenome repair (Burgess et al. 2012). Furthermore, much like chronic inflammatory signaling seen in autoimmune diseases, chronic DNA damage signaling can cause pathogenic recruitment of a variety of histone- and DNA-methylation modifiers and induce inappropriate chromatin remodeling (Burgess et al. 2012), creating a feedback loop, resulting in ever-increasing local chromatin instability. This is modeled well in a genetic disease known as Werner syndrome, a deficiency in DNA helicase (the DNA “unwinding” enzyme) in mesenchymal stem cells, which leads to permanent aberrations in chromatin arrangements including global decreases in histone 3 lysine 9 trimethylation (H3K9me3), histone 3 lysine 27 trimethylation (H3K27me3), among others (Zhang et al. 2015). Age-associated epigenetic changes can also lead to increases in DNA transposition, which has potentially pathogenic effects. Up to 80% of the genome consists of transposable elements; themselves not encoding proteins, but possessing the ability to regulate proximate genes. It has been previously demonstrated that transposable elements induce instability of the surrounding genome when actively transcribed; this transcription is normally reduced with suppressive epigenetic marks, especially histone 3 lysine 9 trimethylation; however, the expression of transposable elements increases with age and cellular senescence, and has been linked with human disease, particularly, diseases of the central nervous system (De Cecco et al. 2013; Reilly et al. 2013; Wood and Helfand 2013).

At a more granular level, global reductions in DNA methylation are seen in with aging, a feature noted decades ago by Wilson and Jones in 1983, who demonstrated that primary human, mouse, and hamster cells had substantially decreased global methylation levels when cultured in vitro over time compared to early passages and immortalized cell lines (Wilson and Jones 1983). Global hypomethylation induces genomic instability as well as leading to increases in gene expression of a variety of genes thought to promote tumorigenesis (Cruickshanks et al. 2013). Substantial changes in DNA methylation patterns associated with age are seen, however, when comparing gene islands, where aging tends to result in decreased methylation, whereas CpG sites within promoters tend to become hypermethylated with age, particularly in genes associated with development and differentiation (Maegawa et al. 2010; Day et al. 2013).

Importantly, the rate of age-associated epigenetic changes can vary among individuals. This concept was perhaps best illustrated by Fraga and colleagues (2005). They examined the epigenomes of lymphocytes (including DNA methylation and histone acetylation patterns) of a large cohort of monozygotic twins in Spain. They found that the epigenetic patterns of twins diverged according to their age; 50-year-old twins had substantial differences, whereas 3-year-old twins were nearly indistinguishable. These differences were, as one would expect, correlated strongly with lifestyle; twins who were younger, had similar lifestyles, and had spent most of their lifetimes together had the least DNA methylation changes throughout their genome, whereas twins who were older, had divergent lifestyles, and spent less of their lifetimes together had the most-divergent DNA methylation patterns. Importantly, these epigenetic differences were correlated with changes in gene expression patterns, and the differences persisted across a variety of tissue types. Although the full consequences of altered DNA methylation states associated with aging have not yet been worked out, there are suggestions that epigenetic alterations are associated to specific transcription factors’ binding sites, and that this may contribute to gene misregulation during aging, as highlighted by Sun et al. in hematopoietic stem cells (Sun et al. 2014).

This can have direct implications in the development of age-related diseases; cancer, for instance. An article published in 2010 by Teschendorff et al. (2010) examined targets of polycomb group proteins (PCGTs), which are epigenetically repressed in multipotent cells including human embryonic and adult stem cells (Lee et al. 2006). DNA hypermethylation of PCGTs occurs at a higher rate (~12-fold) than at other genomic locations in the development of cancer. Teschendorff first identified significant linear correlations between age and DNA methylation of PCGTs (increases in methylation associated with age) irrespective of cancer status, in multiple tissues, including whole blood, lung, cervix, and mesenchymal stem cells, among others. Interestingly though, the DNA methylation levels of these age-related PCGTs were even higher in cancer tissue, suggesting an exacerbation of the underlying DNA methylation changes caused by aging. This was bolstered by this age-related PCGT hypermethylation signature’s ability to discriminate preneoplastic from normal cells.

2.2 The Concept of an Epigenetic Clock

In recent years, many additional investigators have sought to define the pattern of epigenetic (and particularly, DNA methylation) changes that occur during aging, and how these may be associated with or predictive of pathogenic disease states. Chief among them is Steve Horvath of the University of California—Los Angeles. His seminal article in 2013 defined a multi-tissue algorithmic predictor of biological age based on a calculated “methylation age”, defined from ~8000 healthy human tissues and cell types (Horvath 2013) (Fig. 2.3). This initial algorithm was an attempt by Horvath to produce an algorithm that could accurately predict a patient’s biologic age based on DNA methylation information quantified using Illumina’s genome-wide DNA methylation arrays (see subsequent chapters on epigenetic measurement methods), an important point to remember when considering later works seeking to define differences in this DNA methylation “clock” associated with chronic diseases. Using a machine learning approach, his final algorithm leveraged the DNA methylation values of 353 CpG sites to predict an epigenetic age, which was highly accurate with an average Pearson correlation coefficient 0.97 and median absolute error of 2.9 years; importantly, accurate correlations were made using this same algorithm across a variety of tissue types, including whole blood, a variety of brain samples, normal breast tissue, buccal cells, cartilage, dermal fibroblasts, epidermis, head and neck tissue, kidney, lung, mesenchymal stromal cells, stomach, thyroid, and others. There are a couple of notes of particular importance in his study. First and most interestingly, his algorithm was quite well validated in mixed peripheral blood samples. This is surprising given that this tissue is composed of a variety of cell types which have remarkably different lifespans; for example, monocytes live several weeks, whereas CD4+ memory T cells can live decades. DNA methylation age values were concordant even when examining these cell lines individually. The second observation is that his algorithm is applicable to nonhuman primates; interestingly, it correlated best with chronological age in chimpanzees, but somewhat less well in gorillas, likely reflecting a larger evolutionary distance. Finally, as one would predict, the DNA methylation age of induced pluripotent- and embryonic stem cells are very close to 0, indicating that they do not display any substantial epigenetic aging.

Fig. 2.3
figure 3

Adopted by Chen et al. (2016)

The DNA methylation-based epigenetic ‘clock’ is a continuous readout of cellular, tissue, and organismal aging.

This work has subsequently been extended by both Horvath and others to include the study of alterations of the epigenetic clock in a plethora of human diseases (Table 2.2). The aforementioned Werner syndrome, a disease whose characteristics include many of the clinical signs of accelerated aging, is indeed associated with accelerated epigenetic aging of peripheral blood cells (Maierhofer et al. 2017). A number of central nervous system disorders exhibit accelerated epigenetic aging. Among these are Parkinson’s disease patients (Horvath and Ritz 2015) and Alzheimer’s disease (Levine et al. 2015b). In the latter case, accelerated epigenetic aging was found in the pathogenically affected organ itself (measured on dorsolateral prefrontal cortex samples), and was individually associated with a variety of neuropathological measurements, including the presence of diffuse plaques, neuritic plaques, amyloid load, cognitive functioning, and memory. In another fascinating study, Levine, Horvath, and colleagues examined methylation aging in blood samples drawn at baseline in a group of roughly 2000 patients as part of the Women’s Health Initiative, 43 of which went on to develop lung cancer during an almost 20-year follow-up period, their hypothesis being that individuals with naturally occurring “slowing” of the epigenetic clock would be protected against the oncogenic effects of carcinogens like cigarette smoking. They found that, indeed, baseline acceleration of the epigenetic clock was associated with the development of cancer; surprisingly, in all groups, including patients who were former- or never-smokers (Levine et al. 2015a). This effect was even stronger in chronologically older individuals (above 70 years of age).

Table 2.2 Conditions linked to alterations in the epigenetic clock. Adapted from Chen et al. (2016)

These studies offer the intriguing possibility of a generally increased susceptibility to a variety of chronic diseases based on epigenetic mechanisms, above and beyond those associated with specific diseases, genes, and pathways, which will be discussed in subsequent chapters. Horvath and colleagues have gone on to demonstrate premature epigenetic aging in a variety of chronic diseases and factors previously associated with chronic disease, including prenatally acquired HIV infection (Horvath et al. 2018), coronary heart disease (Horvath et al. 2016), menopause (Levine et al. 2016), cartilage of osteoarthritis patients (Vidal-Bralo et al. 2016), increases in body mass index, waist circumference, and fasting glucose (Grant et al. 2017). Furthermore, and of particular importance to the theme of this book, Irvin et al. have identified associations between accelerated epigenetic aging and a variety of inflammatory cytokines, including interleukin-6, C-reactive protein, and tumor necrosis factor-alpha (Irvin et al. 2018). In essentially all of the epigenetic aging literature to date, it appears that accelerations in preexisting aging-associated DNA methylation changes are associated with the onset of disease, indicating that potential future therapies directed at slowing or reducing epigenetic aging may be of benefit for prevention of chronic disease. Finally, this concept of an epigenetic clock (and of accelerations or decelerations of epigenetic aging) have been suggested as the most promising molecular estimator of biologic age in a recent review, which pitted epigenetic clocks against transcriptomic-based, proteomic-based, metabolomic-based, and composite biomarkers, as well as telomere length (Jylhävä et al. 2017).

2.3 Epigenetic Effects of Antiaging Interventions

If aging, one of the strongest risk factors for most chronic diseases, is mediated to a large extent by widespread epigenetic changes, it stands to reason that interventions that slow or reverse the biological effects of aging might be acting through epigenetic mechanisms. A few studies have been published recently which do indeed indicate that this is the case, and serve to reinforce the notion that epigenetic states generally are plastic and potentially reversible. This concept that will no doubt be increasingly important in the near future as we continue developing the ability to modify epigenetic states in a site-specific manner.

As an example of antiaging epigenetic modification, we will highlight a 2017 article by Maegawa and colleagues (2017). In this study, they examined whole-blood DNA methylation patterns in a genome-wide fashion from young and old individuals from three species with widely disparate maximum lifespans: mice and rhesus monkeys fed either a “normal” diet or a calorie-restricted one, and human samples (Fig. 2.4). They describe several interesting findings. First, they quantify the rate of epigenetic “drift”; that is, the average percent change in methylation rates per year of life. There was a very strong inverse correlation between the rate of epigenetic drift and the species maximum longevity, suggesting that age-related drift in DNA methylation may have a role to play in limiting the lifespan of a particular species. Next, they confirm that these changes in DNA methylation were indeed correlated with change in gene expression, suggesting that these changes are functional, rather than innocent bystanders. Finally, they examined in great detail what occurs to DNA methylation patterns when mice and monkeys have their food calories reduced by 40% and 30%, respectively. Caloric restriction has been previously demonstrated in many studies to have antiaging (or, at least, life-extending) properties (review, Fernández-Ruiz 2017). In the case of mice, caloric restriction (CR) was started in young adulthood (0.3 years of age) and continued throughout their 2–3-year lifespan, whereas in monkeys, CR began in middle age, and data analyzed at 22–30 years of age. DNA methylation analysis of whole-blood samples showed that old-CR animals were much more like young animals, and regular-diet animals were in a separate cluster. This effect was most pronounced at CpG sites which were unmethylated in young animals, and were more pronounced in mice, which had caloric restriction begun earlier on relative to their total lifespan than in monkeys. These epigenetic effects were best explained by a substantial reduction in epigenetic drift. Finally, they went on to examine the tissue specificity of these changes using samples from a variety of organs, including spleen, bone marrow, liver, kidney, small intestine, and large intestine, and found that this phenomenon of epigenetic drift reduction with calorie restriction was present in most tissues, exceptions being kidney and liver. This study has several direct implications on the study of epigenetics in chronic diseases and, perhaps most importantly, offers evidence that at least some of the disease-reducing benefits seen in interventions that increase lifespan may occur through epigenetic modulation.

Fig. 2.4
figure 4

Adopted from Maegawa et al. (2017)

Cross-species comparison of DNA methylation drift and aging. a Correlations of methylation % and age across the lifespan of mouse, monkey, and human. b Association of normalized (% per year) DNA methylation drift and maximum species longevity.

3 Epigenetic Modifications as Biomarkers of Disease

So far in this chapter, we have focused on epigenetic modifications and their association with the disease from a pathogenesis standpoint. Another application of epigenetic study that has garnered more attention in the past few years is the potential of epigenetic modifications as biomarkers of the presence or progression of a particular disease. In this section, we will offer several examples of recently described epigenetic biomarkers of some of several important human diseases, and, when appropriate, discuss the advantages of using epigenetic biomarkers over other traditional methods of diagnosing and monitoring disease progression.

The field of oncology has seen the most interest in epigenetic biomarkers, where markers fall into one of three major categories: early or initial diagnosis, risk stratification, and prediction of treatment response. Take, for example, colorectal cancer, the second leading cause of cancer-related deaths in the US behind lung and bronchial cancer (Siegel et al. 2011). The gold standard method for colon cancer screening at present is a colonoscopy procedure, whereby precancerous polyps can be identified and removed. This method has indeed reduced significantly colorectal cancer incidence and mortality (Schoen et al. 2012); however, an estimated 6% of colonoscopies detect advanced adenomas, and around 1% detect frank adenocarcinoma (Ferlitsch et al. 2011). Furthermore, the colonoscopy procedure itself is associated with nontrivial morbidity and is both expensive and burdensome for patients, who must undergo significant preparatory work the day before the procedure, receive conscious sedation during the procedure, etc. Clearly, a less invasive, more accurate, and earlier diagnostic biomarker for colon cancer would be desirable. Colon cancer occupies a unique position from an epigenetic standpoint in that much epigenetic analysis has already been done on cancerous and precancerous colon lesions (polyps, which are removed as part of screening colonoscopies) and the distinguishing epigenetic features have been well described (Lao and Grady 2011). Building on these data, several studies have suggested epigenetic biomarkers for both diagnosis and evaluation of the future risk of colon cancer. For example, hypermethylation of the MGMT gene has been strongly associated with the future likelihood of colon cancer development, even in grossly normal colon tissue, and has been suggested as a key early factor in carcinogenesis (Menigatti et al. 2009; Lang 2011). More recently, Luo and colleagues have reported that the DNA methylation pattern of a panel of six genes (AOX-1, RARB2, RERG, ADAMTS9, IRF4, and FOXE1) in a much more accessible biomarker fluid (peripheral blood cells) was highly associated with colorectal cancer (Luo et al. 2016). To date, at least two epigenetic biomarker panels have been FDA approved for colorectal cancer diagnosis including ColoVantage (a predictor using methylation of SEPT9 in peripheral blood) and ColoSure (a predictor using methylation of the vimentin gene in fecal samples).

Another example of recent epigenetic biomarker development can be found in prostate cancer, a disease that has seen much controversy in recent years. This mostly stems from the prostate-specific antigen (or PSA) test, a previously widely used biomarker for the presence of prostate cancer, which led to what many experts feel is an overly aggressive biopsy- and treatment-paradigm in a disease that is frequently indolent in nature. The PSA test is a poor marker for a number of reasons, including a non-definitive cutoff for positivity, a nontrivial rate of PSA elevations without detectable prostate cancer, and a suboptimal rate of false-negative results (Castle 2015). Like colon cancer, much has already been described regarding the specific epigenetic changes that are associated with the transition from precancerous to malignant prostate tissue (Ruggero et al. 2018). A number of noninvasive prostate cancer epigenetic screening methods have been published, including APC gene methylation screening in urine (Jatkoe et al. 2015), CHD13 in serum (Wang et al. 2014), and ERBeta in serum (Brait et al. 2017). Unlike colorectal cancer, however, there have to date been no FDA-approved epigenetic biomarkers for the diagnosis of prostate cancer.

Germane to this book on rheumatic disease are recent studies examining epigenetic biomarkers in systemic lupus erythematosus (SLE). One would expect, given that autoimmune diseases are driven to a substantial degree by alterations in circulating inflammatory cells, that peripheral blood-based epigenetic biomarkers might be both related to the underlying biological pathogenesis and easily accessible for clinical application; indeed, the search for epigenetic biomarkers in peripheral blood cells has been quite fruitful in several autoimmune diseases. Lupus is an interesting and important target for novel biomarker development; it is a quite heterogeneous disease characterized by autoantibody production against a variety of nuclear targets, affecting almost every body system. The diagnosis of lupus is notoriously difficult, but very important, as delayed diagnosis can lead to substantial irreversible organ damage (Fortin et al. 1998). Currently available laboratory markers for lupus have substantial limitations, including mismatches in sensitivity and specificity.

An article by Zhao and colleagues in 2015 offers a good example of diagnostic biomarker development in lupus (Zhao et al. 2016). First, they screened peripheral blood mononuclear cell DNA methylation patterns from lupus patients, healthy controls, and non-lupus autoimmune rheumatoid arthritis and Sjogren’s syndrome patients, and an independent validation cohort of lupus patients and healthy controls for potential disease-associated biomarkers using an Illumina genome-wide DNA methylation array (see subsequent chapter on methods for epigenetic quantitation). They identified differentially methylated sites within the IFI44L gene as highly associated with the presence of lupus; then, they examined two specific locations within this region in a much larger group of patients as part of a discovery cohort, using a different method (bisulfite pyrosequencing). They went on to confirm the sensitivity and specificity of the DNA methylation values of these two CpG sites in multiple validation cohorts, including among the same ethnic group (Chinese) and among a different ethnic group (Europeans). Remarkably, these methylation sites had quite high sensitivity and specificity for the diagnosis of lupus, in the 90% range for both among Chinese, and in the 70–80% range for both in Europeans. Furthermore, they were able to accurately differentiate lupus from both rheumatoid arthritis and Sjogren’s syndrome patients. A subsequent study by Coit et al. (2015) identified a single CpG within the CHST12 gene of naive T cells from lupus patients as highly associated with the presence of lupus nephritis, a manifestation of SLE which is difficult to detect without an invasive biopsy, with a sensitivity of 86% and specificity of 71%.

Other studies have similarly noted strong associations with certain clinical manifestations and disease indices with alterations in DNA methylation of easily accessible tissues. These include IL10 and IL12 hypomethylation correlation with lupus disease activity scores (Lin et al. 2012), IL6 methylation correlation with lupus disease activity, prediction of flare, and serum complement levels (Mi and Zeng 2008; Tang et al. 2014), FOXP3 methylation association with disease activity (Horwitz 2008), and retroviral element HERV-E and HERV-K methylation association with both disease activity and the presence of a variety of autoantibody specificities (Okada et al. 2002; Piotrowski et al. 2005).

Another area of biomarker research where epigenetics can play a strong role is in predicting the response to particular therapies. In rheumatoid arthritis (RA), for example, we are fortunate to have dozens of effective “traditional” and biologic medications with more being approved on an almost yearly basis; unfortunately, however, experience has shown that these are not universally effective. Additionally, most of these new drugs take some time to reach peak effectiveness; consequently, when a patient has a suboptimal response within the first weeks of treatment, it is nearly impossible to determine whether this represents a failure of the therapy or simply a delayed response. Given the substantial cost of these treatments, a need clearly exists for biomarkers that can help stratify RA patients and predict whether or not they will respond to a particular drug or class of drugs.

Traditional non-biologic synthetic disease-modifying drugs (DMARDs) may actually partially work by inducing epigenetic changes, as highlighted by de Andres and colleagues in 2015, who outlined substantial DNA methylation changes induced by treatment with one of these drugs, methotrexate, in RA patients (de Andres et al. 2015). Although this is a relatively new field, an article by Plant and colleagues in 2016 demonstrated the usefulness of peripheral blood epigenetic patterns to predict clinical responses in RA patients subsequently treated with the antitumor necrosis factor-alpha drug etanercept (Plant et al. 2016). In this study, they performed an epigenome-wide association analysis (EWAS). The discovery cohort included 36 pretreatment whole-blood samples from RA patients who later had a poor response to etanercept by a well-validated clinical disease activity index after three months of treatment and compared this with 36 pretreatment whole blood samples from patients who later had a very good response to etanercept (clinical disease activity index in remission after three months of therapy). They identified five CpG sites with significant differential methylation between responders and nonresponders; intriguingly, all five demonstrated reduced methylation among good responders compared to poor responders. Interestingly, two of the top five, and four of the top 15 most differentially methylated CpG sites between groups were located within exon 7 of the LRAP1 gene; they went on to demonstrate that three genetic single nucleotide polymorphism mutations were highly correlated with the DNA methylation levels of two of these CpGs, indicating that epigenetic and genetic mechanisms in LRAP1 interact to produce a good or bad TNF inhibitor response in RA patients. This sort of interaction, known as methylation quantitative trait loci or meth-QTL, has in recent years been demonstrated as quite important in a variety of clinical phenotypes and disease states, and will likely be even more intensely studied as the field moves toward whole-genome and—epigenome studies, collectively known as “big data”, which will be discussed in the following final section of this chapter.

4 Epigenetics in the Modern World: Using Big Data to Understand the Pathogenesis of Complex Diseases in the Present Day

In the final section of this chapter, we will move from a historical perspective of epigenetics to discuss the state-of-the-art in epigenetics research, as well as offer some glimpses into the future of epigenetics research as it relates to chronic disease research generally, and disorders of the immune system specifically. As technology has advanced and allowed researchers to generate increasingly intricate maps of both genetic risk and epigenetic associations in chronic disease states, new techniques have been developed to make sense of this massive amount of data. Several studies have been recently published, including in the field of rheumatology, which seeks to computationally synthesize large genetic, epigenetic, proteomic, and transcriptomic datasets in key cell types and draw conclusions regarding the ways in which these—omics domains interact regionally to contribute to the development of the disease. In a way counterintuitively, large, detailed data sets and complex computational analysis have also allowed researchers to take a step back and draw more general conclusions about the genes, organ systems, and pathways that are altered in chronic diseases generally. As the cost of complete genomic sequencing and base-specific whole-genome epigenetic analyses continue to fall, we will no doubt see more of these integrated analyses in the future; hopefully offering a more complete understanding of the fundamental processes underlying disease pathogenesis driven by machine-learning-driven insights, and perhaps offer novel treatment strategies.

A nice example of this computationally driven view of the integration of underlying genetic sequence and epigenetic modification in the development of autoimmune diseases generally can be found in an article by Farh et al. (2015). In this study, they collected genome-wide association study (GWAS) data from 39 independent, well-powered GWAS studies representing a variety of complex systemic diseases from published datasets. They then clustered diseases based on shared genetic susceptibility loci in order to produce a map relating the diseases to each other on a genetic level; remarkably, many disparate disorders shared a number of similar features. For example, 69% of single nucleotide polymorphisms (SNPs) collected in their meta-analysis were shared among at least two autoimmune diseases. Next, they developed a novel computational approach to estimate the probability that SNPs associated with multiple autoimmune diseases represented a causal SNP as opposed to an “innocent bystander”, a method they termed Probabilistic Identification of Causal SNPs (PICS). Next, they generated immune cell subtype-specific “maps” (including, for example, CD4+ and CD8+ T cell subsets, B cells, and monocytes) based on both data generated within their laboratory and data previously published through the NIH Epigenomics Project and the Encyclopedia of DNA Elements (ENCODE, a publicly accessible research project aiming to identify and catalogue all functional elements within the human genome), including a 56 cell types. They computed a genome-wide map of histone posttranslational modification regulatory elements and then clustered individual cell types based on these patterns.

Finally, they inferred the cell types most likely driving particular autoimmune diseases by overlaying these two datasets; that is, they looked for cell types in which disease-specific genetic mutations were located in regions known to be epigenetically “active” (Fig. 2.5). This allowed them to guess which cell types are most likely involved in specific autoimmune disease. Several of the associations were predictable; for example, both T-cell stimulation and B cells shared enrichment of epigenetically active enhancers. Interestingly, these SNPs were also common within generalized stimulus-dependent enhancers; epigenetically active enhancers from unstimulated T cells did not overlap with these SNPs. Further, and a good example of the complexity of epigenetic regulation, they found that many disease- and cell-type-associated SNPs were located within regions which regulated the transcription of noncoding RNAs, another epigenetic control mechanism. In the end, they estimate that genetic changes in enhancer regions producing mixed genetic–epigenetic effects overall account for around ~60% of all disease-associated genetic variants (Houseman et al. 2012; Martino et al. 2012).

Fig. 2.5
figure 5

Adopted from Farh et al. (2015)

Cell-type specificity of human diseases, inferred from epigenetic mapping.

The next problem, of course, is identifying how these genetic/epigenetic interactions within enhancer regions in specific immune cell subsets are actually contributing to disease risk or pathogenesis. For example, the most obvious potential explanation is that these changes conspire to alter gene regulation, inducing overexpression or underexpression or, intriguingly, interrupting the appropriate regulation of the gene in question, allowing it to be turned on or off at inappropriate times. They investigated this possibility by modeling how these variations (be they genetic or epigenetic) alter the binding of transcription factors. Although they did find enrichment of PICS SNPs within or proximate to many transcription factor binding sites, these represented only a minority (around 7%), and was more or less the percentage one would expect to find at random from non-disease-related SNPs. Unfortunately, the authors were unable to give a definitive pathophysiologic mechanism, although they speculate that future research to better define the structure and function of gene enhancers, which the majority of these sites fell within, may allow a better understanding.

As outlined above, one would expect that this sort of analysis would also give us another important bit of information: the cell types which are most likely to be involved in a particular disease based on this genetic/epigenetic inference. The authors compared their list of likely “causal” SNPs for a given disease with the histone “fingerprint” of a wide variety of cell types included in both the NIH Epigenomics Project and ENCODE datasets mentioned above. Several of these associations were expected; for example, migraine disease and Alzheimer’s were both predicted to involve brain tissue, just as inflammatory bowel diseases like ulcerative colitis mapped to gastrointestinal tissue. Some were more surprising, however. For example, systemic lupus erythematosus, Kawasaki disease (a form of large vessel autoimmune vasculitis), and primary biliary cirrhosis all mapped mostly to B cells. One can imagine large genetic/epigenetic screens such as this one being used in the future to help researchers narrow down the increasingly large field of potential players in a particular condition for more in-depth large-scale-omics analyses; future similar studies may also allow researchers to predict the most likely involved cell types to study for potential therapeutics.

Multi-omics, integrated computational analyses can also be quite informative when tasked with closely examining a single pathogenic tissue. As an example of this, in 2018 Steinberg and colleagues published an integrative and multi-omics analysis of cartilage from osteoarthritis (OA) patients, and offers a good example of how such state-of-the-art studies can be used to distill enormous amounts of data to actionable targets. In their study, they collected osteochondral samples from patients undergoing total knee joint replacements, both eroded and intact sections from within the same joint, as well as control cartilage samples from non-OA individuals, in both a discovery and two replication cohorts. They then set out do examine proteins that were differentially present in the two disease states using liquid chromatography-mass spectrophotometry (LC-MS) technique, mRNA gene expression using RNA-Seq, and epigenetic patterns (specifically, DNA methylation) using Illumina’s genome-wide DNA methylation microarray system. As expected, they found a large number of differences in each domain; 209 proteins were differentially abundant, 349 genes were differentially expressed from an mRNA perspective, and 271 differentially methylated regions were identified from an epigenetic perspective (Fig. 2.6). The important next step, which has only been possible with technological and computational advances over the past few years, was the integration of these data across multiple-omics domains. They found 49 genes which differed in at least 2 domains, and three genes that exhibited significant evidence for OA involvement across all three domains: aquaporin 1 (AQP1), the collagen 1 gene (COL1A1), and CLEC3B gene, which encodes tetranectin, a regulator of fibrinolysis. All three of these genes were upregulated at both a protein and mRNA level, and also exhibited reduced DNA methylation at all CpG probes associated with them included in the Illumina arrays. Interestingly, of the 49 genes with differential regulation across at least 2 domains, fully one third had not been previously implicated in OA pathogenesis.

Fig. 2.6
figure 6

Adopted from Steinberg et al. (2017)

Overview of genes identified as associated with osteoarthritis in human cartilage from a multi-omics perspective.

An important consequence of this sort of multi-omics domain is the ability of researchers to hone in on potentially important, but previously unrecognized, avenues for treatment. In this study, they applied their list of 49 genes with cross-domain association with OA to Drugbank (Law et al. 2014), a comprehensive databank which lists information on drug targets, and identified ten drugs which had actions on nine of the 49 dysregulated proteins, all of which already had Food and Drug Administration marketing authorization for use in human patients. Some were expected, including non-steroidal anti-inflammatory drugs, but a few were novel, including vitamin K1 (phylloquinone), an antipsychotic/antiemetic (trifluoperazine), and a drug used to treat elevated cholesterol (ezetimibe), among others. As technology continues to develop, and the costs of performing this sort of global analysis drops, future epigenetic studies will no doubt more and more frequently include just this sort of large, multi-omics approach to data analysis, and will likely substantially benefit patients by identifying previously unrecognized druggable targets in various epigenetically driven diseases.

5 Summary

In this chapter, we have highlighted the history of the discovery of epigenetic control mechanisms, which in many ways paralleled the discovery of the DNA code itself. We then highlighted a few ways in which epigenetics has changed the way we think about both basic biological processes and the pathogenesis of the complex human disease. We began with a discussion of the epigenetics of “normal” aging, and went on to discuss the recent development of an epigenetic “clock”. Alterations in the rate of epigenetic aging have been demonstrated to be associated with a variety of chronic human diseases; furthermore, interventions that slow aging also appear to slow the rate of epigenetic aging, suggesting that antiaging interventions may have widespread epigenetic effects. We then discussed a few examples of a nonpathogenic role for epigenetics in clinical medicine: the development of epigenetic biomarkers both for the presence of disease, but also as predictors of response to particular therapies. Finally, we exampled two examples of complex, modern epigenetic studies, which leverage massive amounts of data from multiple biological domains to infer the ways in which epigenetic modifications affect gene expression at a transcriptomic, proteomic, and whole-organism level. The ways in which epigenetic mechanisms have contributed to our understanding of disease pathogenesis is nothing short of remarkable; given its meteoric rise in importance over the past 100+ years, there is no doubt that the study of epigenetics in human disease has a bright future indeed.