Introduction

Ensuring the integrity of the genome while maintaining the flexibility to allow for adaptive change through mutation is essential for the long-term survival of a species. Genomes are assaulted continuously by various forces impacting their integrity, including both internal and external factors (Tubbs and Nussenzweig 2017). Examples of external forces include DNA damaging ionizing or UV radiation as well as various chemicals such as oxidizing agents, DNA intercalators, and base analogs (Chatterjee and Walker 2017; Mehta and Haber 2014). Viruses that can integrate into host genomes, thus potentially disrupting vital gene functions, are another external factor impacting genome stability (Weitzman and Fradet-Turcotte 2018). Internal forces affecting genome integrity include, among others, the error rate of the DNA replication machinery as well as the efficiency of DNA repair pathways, reactive oxidant byproducts of metabolism, and active transposable elements (TEs) (Aguilera and Garcia-Muse 2013). Thus, to keep their genome intact and ensure genome stability, organisms have to manage these diverse challenges while tolerating the low level of mutation that is the basis of evolutionary change.

The importance of genome stability is illustrated further by the consequences of its breakdown. Genome instability occurs when the genome accumulates mutations at an increased rate (Aguilera and Garcia-Muse 2013). It is a hallmark of many cancerous tissues, which tend to exhibit numerous genetic alterations compared to the genomes of matched non-cancerous tissues (Tubbs and Nussenzweig 2017). The genetic alterations observed in cancers are diverse and can include single-base mutations, copy number changes, chromosomal rearrangements, and abnormal chromosome numbers (Sansregret et al. 2018). Genome instability also has been associated with the aging process, as mutation rates tend to increase in senescent cells and with increased organismal age (Vijg and Suh 2013). In addition, rates of TE transposition have been reported to increase with aging, contributing significantly to genome instability (Li et al. 2013; Pal and Tyler 2016). These examples demonstrate that genome stability is of vital importance to ensure organismal health, in addition to being key for the long-term survival of a species.

Given the variety of factors able to damage DNA and destabilize genomes, maintaining genome integrity is a complex problem, and numerous intersecting pathways contribute to genome stability. In humans, one example of a pathway protecting the genome from external factors is the production of melanin in the skin (Brenner and Hearing 2008). Melanin protects the genome of skin cells from UV radiation-induced damage, thus preventing mutations and leading to a lower skin cancer rate in individuals with higher levels of melanin (Brenner and Hearing 2008). Another example of pathways contributing to genome stability are the DNA repair pathways that repair DNA double-strand breaks, including the non-homologous end joining (NHEJ) pathway and the homologous recombination (HR) pathway (Chang et al. 2017; Wright et al. 2018). DNA double-strand breaks can be caused by both internal and external factors, but the NHEJ repair pathway, for example, is used for their repair irrespective of the cause of the DNA double-strand break. The use of the NHEJ repair pathway is triggered when a DNA double-strand break is detected and the DNA end is suitable for repair via this pathway, which uses ligation to repair the DNA break without referring to a homologous DNA template (Scully et al. 2019). These examples illustrate that there are diverse mechanisms contributing to genome integrity, which collectively ensure genome stability.

Several of the mechanism and pathways ensuring genome stability are linked to epigenetics. Epigenetics encompasses a variety of phenomena and molecular systems that are connected by the fact that they involve heritable changes in phenotypes or gene expression that are independent of changes in DNA sequence (Felsenfeld 2014). Thus, epigenetics includes the study of DNA methylation, histone modifications, chromatin structure, and non-coding RNAs, both large and small (Felsenfeld 2014) Interestingly, many of these molecular mediators of epigenetic inheritance have been linked to genome stability, and they contribute to varying degrees to the maintenance of genome integrity (Felsenfeld 2014; Fischer and Riddle 2018). In contrast, genome instability often is associated with the breakdown of epigenetic mechanisms, an observation which further supports the link between epigenetics and genome stability. In this article, we review how epigenetic mechanisms impact genome stability (Fig. 1). We survey the role of epigenetic mechanisms in ensuring centromere and telomere function, their impacts on TE activity, and their interaction with DNA repair pathways. We discuss histone variants and modifications, DNA methylation, and non-coding RNAs and demonstrate the importance of these epigenetic systems in ensuring genome stability. Finally, we provide an assessment of the current gaps in knowledge and the opportunities to address them.

Fig. 1
figure 1

Complex interactions between epigenetics and genome stability. Genome stability is of central importance to organismal survival (middle). Here, we highlight four major facets of genome stability (telomeres, centromeres, TE regulation, and DNA repair; purple) and the epigenetic mechanisms that contribute to these pathways (brown)

Epigenetic mediators are essential for centromere function and genome stability

One strong link between epigenetics and genome stability is provided by centromeres, which have an essential role in maintaining genome stability and are specified by epigenetic mechanisms. Centromeres are the portions of eukaryotic chromosomes where the kinetochore assembles and microtubules attach during meiosis and mitosis (McKinley and Cheeseman 2016). Thus, centromeres ensure proper segregation of chromosomes to daughter cells. Dysfunction of centromeres has severe consequences: it can lead to chromosome breakage and fusion, as well as chromosome loss and aneuploidy (Barra and Fachinetti 2018). In severe cases, cytokinesis failure due to lagging chromosomes can lead to polyploidy, which has the potential to further destabilize the genome (Lens and Medema 2019). Another consequence of centromere dysfunction and lagging chromosomes can be the formation of micronuclei, which make the enclosed DNA susceptible to increased rates of DNA damage, replication errors, and again, further enhancement in genome instability (Chunduri and Storchova 2019). Ultimately, centromere dysfunction leads to genome instability by several mechanisms and, if the appropriate cell cycle checkpoints are triggered, cell death [for an example, see (Caneus et al. 2018)]. Thus, centromeres perform vital functions in ensuring genome stability.

Despite their importance for cell viability and genome stability, centromere organization varies widely among species. Based on their centromere organization, chromosomes can be classified as monocentric, which includes both point centromeres and regional centromeres, or as holocentric (McKinley and Cheeseman 2016; Steiner and Henikoff 2015). Monocentric chromosomes have one centromere, which coincides with the primary constriction visible in a mitotic chromosome; examples of species with monocentric chromosomes include humans as well as several model species such as Drosophila melanogaster, Arabidopsis thaliana, Schizosaccharomyces pombe, and Saccharomyces cerevisiae (point centromeres). In contrast, holocentromeres are not restricted to one region of the chromosome, and there is no constriction visible in the mitotic chromosome (Steiner and Henikoff 2015). Instead of the microtubules attaching at just one specific region of the chromosome, they attach at sites along the entire length of the chromosome (Steiner and Henikoff 2015). Examples of species with holocentromeres include the model organism Caenorhabditis elegans and a variety of plant species (Cuacos et al. 2015). These examples illustrate that there is considerable variation among species in how centromeres are organized.

Despite the essential function of centromeres in the regulation of chromosome segregation during cell division, centromere location is not encoded in the DNA sequence (the point centromeres of S. cerevisiae might be considered an exception to this rule) (McKinley and Cheeseman 2016). This fact is best illustrated by the study of neocentromeres, which are centromeres that form at a new location on the chromosome that has not acted previously as a centromere (Fukagawa and Earnshaw 2014b; Scott and Sullivan 2014). Many regional centromeres, such as the ones found in humans, are associated with specific repeated sequences. In humans, for example, centromeres are formed typically over regions containing alpha-satellite repeats. Similarly, regional centromeres in the fruit fly model Drosophila melanogaster contain various AT-rich repeats such as the 359 bp repeat, and centromeres in the plant model system Arabidopsis thaliana are composed of arrays of a 180-bp repeat (Plohl et al. 2014). When centromere location shifts along the chromosome leading to the formation of a neocentromere, the sequences underlying the neocentromeres are diverse, and the repeats typically associated with the centromere can still be detected at its original location (Amor et al. 2004; Schneider et al. 2016). Neocentromeres have been detected in humans on various chromosomes (Fukagawa and Earnshaw 2014a; Scott and Sullivan 2014; Voullaire et al. 1993), and they have been experimentally induced in several species including Drosophila melanogaster (Maggert and Karpen 2001) and chicken (Shang et al. 2013). Thus, neocentromere studies support the conclusion that DNA sequence does not specify centromere location, but that the centromere is specified instead epigenetically.

Research over several decades has led to a model which suggests that the centromere location is determined by the presence of the centromere-specific histone variant CenH3, also known as CENP-A (Palmer et al. 1991; Scott and Sullivan 2014). In addition to the core histones, H2A, H2B, H3, and H4, which typically make up nucleosomes, variants derived from these histone genes exist in many species (Henikoff and Smith 2015; Talbert and Henikoff 2017). Very few variants exist for H2B and H4, whereas H3 has two major variants: CenH3 and H3.3 discussed below. CenH3 is a variant of histone H3 that, under normal conditions, occurs exclusively at centromeres (Gambogi and Black 2019; Palmer et al. 1991). CenH3 is unusual in that, in contrast to the canonical histones, it is undergoing rapid evolutionary change, with sequence changes quickly accumulating, especially in its amino-terminal tail (Malik and Henikoff 2003; Rosin and Mellone 2017). CenH3-containing nucleosomes are found at the functional centromeres of metacentric and holocentric chromosomes, and the S. cerevisiae point centromeres consist of a single CenH3(Cse4)-containing nucleosome (Choy et al. 2012; Furuyama and Biggins 2007; McKinley and Cheeseman 2016). Neocentromeres also contain CenH3 nucleosomes (Henikoff and Furuyama 2012; Warburton et al. 1997), illustrating the importance of this centromeric histone variant, which epigenetically specifies the functional centromere and is thus essential for genome stability.

Interestingly, as more genome sequences have become available, a number of species were identified that lack a recognizable copy of CenH3 in their genome (Drinnenberg et al. 2016). These species include a group of unicellular flagellar eukaryotes related to trypanosomes (Akiyoshi and Gull 2014; Berriman et al. 2005) and also a variety of insect species, including the Lepidoptera (butterflies) and Odonata (dragon flies) (Drinnenberg et al. 2014). In the insect species lacking CenH3, most other kinetochore proteins are conserved (Drinnenberg et al. 2014), while in the trypanosomes, the conventional kinetochore proteins are absent as well, with novel proteins absorbing their functions (Akiyoshi and Gull 2014). As CenH3 typically is required to specify the centromere location, these findings raise the question of how centromeres form in the species lacking CenH3, and how a protein essential for genome stability can be lost over evolutionary time.

Interestingly, a second variant of H3 also contributes to centromere function. H3.3 differs from canonical H3 by only four to five amino acids, and it is typically found in regions of the genome that are actively transcribed (Dion et al. 2007; Franklin and Zweidler 1977; Henikoff and Smith 2015; Schwartz and Ahmad 2005). Remarkably, genetic studies have shown that it is crucial for maintaining genome stability during mammalian development (Bush et al. 2013; Jang et al. 2015). In mice, complete loss of H3.3 leads to embryonic lethality. Loss of H3.3 in embryonic stem cells leads to mitotic defects, including chromosome bridges and lagging chromosomes in anaphase. These problems during mitosis result in abnormal chromosome number, genome instability, and cell death, which are likely the cause of the embryonic lethality. In addition, H3.3 loss also negatively impacts the chromatin structure at the telomeres, centromeres, and pericentric regions, which contributes to the observed genome instability (Jang et al. 2015). Thus, histone variants are heavily involved in regulating the function of the centromere and contribute significantly to genomic stability.

Studies in several species have shown that while the position of the centromere is determined epigenetically by the presence of CenH3 in most species, the strength of a centromere and its ability to ensure faithful chromosome segregation are determined by genetic as well as epigenetic factors. While centromeres are formed at a specific region along the chromosome in regional monocentric chromosomes, the exact position of where CenH3 localizes is variable among individuals, and the CenH3-containing chromatin block can shift. These regional shifts in CenH3 location have been documented, for example, in the genus Equus by Nergadze and colleagues, who find that centromeric epialleles with slightly different localization of the CenH3 chromatin are quite common (Nergadze et al. 2018). In maize, Gent and colleagues find that the exact location of the centromere position can shift between generations (Gent et al. 2017). In humans, centromeric epialleles have been described for chromosome 17, which differ in the composition of the underlying alpha-satellite sequence, leading to variability in centromere function between the two epialleles (Aldrup-MacDonald et al. 2016; Maloney et al. 2012). Specifically, one of the centromeric epialleles, D17Z1, showed decreased CenH3 recruitment and increased levels of aneuploidy for this chromosome (Aldrup-MacDonald et al. 2016). These studies demonstrate that the centromeric epialleles impact centromere function and can differ significantly in their contribution to genome stability.

Another aspect of the epigenetic state of the centromere that impacts its ability to ensure genome stability is the post-translational modification (PTM) of the histones found at the centromere. PTMs of both CenH3 and the canonical histones present at the centromere are important for its function (Bowman and Poirier 2015; Johnson et al. 2004; Loyola et al. 2006; McKittrick et al. 2004; Waterborg 1990). Canonical histone H3 present at the centromere and interspersed with CenH3 nucleosomes (Blower et al. 2002) is methylated at H3K4 (H3K4me2) and lacks H3K9 methylation (me2 and me3), which is found in the pericentromeric regions (Sullivan and Karpen 2004). Using a human artificial chromosome (HAC) system, Molina and colleagues removed H3K4me2 from the centromere and found that segregation errors for this chromosome increased (Molina et al. 2016). If H3K9me2/me3 chromatin spreads from the pericentric regions into CenH3 chromatin, centromere function is perturbed also, leading to chromosome segregation defects and genome instability (Bergmann et al. 2012; Ohzeki et al. 2016). In addition, CenH3 is subject to PTM (Srivastava and Foltz 2018), and some of the PMTs have been linked to centromere function and genome stability. For example, on CENP-A, the human CenH3 homology, serine 18 (S18) hyperphosphorylation leads to increased genome instability (Takada et al. 2017), while CENP-A serine 7 (S7) phosphorylation is not essential for centromere function (Barra et al. 2019). Together, these findings demonstrate that epigenetic marks in the form of histone PMTs at the centromere are essential for centromere function and the maintenance of genome stability.

Non-coding RNAs are another way epigenetic mechanisms contribute to centromere function and genome stability (Talbert and Henikoff 2018). Volpe and colleagues demonstrated that small RNAs derived from the outer centromeric repeats in S. pombe are required for centromere function (Volpe et al. 2003, 2002). Small RNAs since have been shown to be derived from centromeres in other species as well (Cohen et al. 1973; Hall et al. 2002; Maison et al. 2002; Perea-Resa and Blower 2018), and the data suggest that they are important for centromere function and genome stability. In addition to these small RNA classes, there are also centromere-derived RNAs that appear to be an integral part of the centromere (Ling and Yuen 2019; Talbert and Henikoff 2018). These RNAs were observed first in maize in 2004 (Topp et al. 2004) and since have been the focus of detailed studies. In Drosophila, loss of the long non-coding RNAs from the centromeric 359 bp repeat leads to chromosome segregation defects (Rošić et al. 2014), and Ling and colleagues discovered that in S. cerevisiae both over- and underproduction of the long non-coding RNAs from the centromere leads to increased levels of aneuploidy (Ling and Yuen 2019). Together, the data available suggest that non-coding RNAs, both large and small, are required for the optimal function of the centromere to ensure genome stability.

In summary, centromeres are essential for genome stability as they ensure proper chromosome segregation to daughter cells. Centromere position is epigenetically determined by the location of the CenH3 histone variants, and various epigenetic mechanisms including non-coding RNAs, histone modification, and chromatin structure components are essential for normal centromere function. Thus, studies of centromere biology demonstrate unequivocally that epigenetic mechanisms contribute essential functions to genome stability and how disruptions of these mechanisms lead to genome instability.

Telomeres rely on epigenetic mechanisms to ensure genome stability

In addition to centromeres, telomeres are a second type of chromosomal structure that is essential for maintaining genome stability and that employs epigenetic mechanisms for optimal function. Telomeres are structures at the end of linear chromosomes that protect chromosome ends from fusion by recombination and from degradation by nucleases (O'Sullivan and Karlseder 2010). Typically, telomeres exhibit a specialized chromatin structure that is similar to heterochromatin and imparts transcriptional silencing on sequences located in close proximity (Baur et al. 2001; Karpen and Spradling 1992; Palladino et al. 1993; Schoeftner and Blasco 2009; Wallrath and Elgin 1995). This unique chromatin structure protects the chromosome ends, and loss of this capping function can lead to aneuploidy as chromosome bridges, chromosome fusions, and lagging chromosomes form that will impair mitosis and meiosis (O'Sullivan and Karlseder 2010). In addition, telomeres present a solution to the end-replication problem, which arises because DNA polymerases cannot fully replicate the ends of the lagging strand during DNA replication (de Lange 2009). Ultimately, telomeres perform several functions critical to maintaining genomic integrity.

In order to prevent the loss of sequences at the ends of linear chromosomes with each cell division (end-replication problem), telomeres are composed of repetitive DNA sequences, and several cellular pathways exist to prevent shortening of chromosomes (Jafri et al. 2016). While this function of telomeres is evolutionary conserved, the specific mechanism that maintains telomere length differs from species to species. In most eukaryotes, telomerase, a ribonucleoprotein complex with reverse transcriptase ability, can synthesize new telomeric repeats, thus maintaining telomere length (Greider and Blackburn 1987; Wu et al. 2017a). Saccharomyces cerevisiae also has telomerase activity, but alongside the telomerase, it uses recombination to elongate telomeres (Larrivee and Wellinger 2006; Wellinger and Zakian 2012). In Drosophila melanogaster, telomere length is maintained by two non-LTR retrotransposons, HeT-A and TART, that transpose specifically to chromosome ends (Pardue et al. 2005). Failure of the mechanisms maintaining telomere length leads to shortening telomeres and eventually loss of crucial genetic information located adjacent to the telomeres and cell death (Muraki et al. 2012). Shortening of telomeres is a noteworthy concern in rapidly dividing cells, illustrated by the fact that many cancer cells express high levels of telomerase, which promotes further cell division and growth (Jafri et al. 2016). Thus, telomere maintenance is an essential process to ensure genome stability, and epigenetic mechanisms are required to achieve this goal.

Epigenetic processes can contribute to telomere maintenance by controlling the expression of telomerase in mammals and the expression of the telomeric TEs in insects like Drosophila. In addition, epigenetic modifications, specifically histone modifications, are crucial for controlling telomere length (Counter et al. 1992; Jezek and Green 2019). Changes in telomere chromatin state can alter telomere length and thus lead to genome instability (O'Sullivan and Karlseder 2010). For example, Galán and colleagues found that in yeast, Sus1, a protein involved in histone H2B deubiquitination, is required for the proper regulation of telomere length (Galan et al. 2018). Deletion of Sus1 leads to telomere elongation and an increase in histone H2B lysine 123 (H2BK123) mono-ubiquitination. Thus, the findings by Galán and colleagues suggest that Sus1 negatively regulates telomere length through its impact on histone ubiquitination, linking telomeres, genomic stability, and histone modifications. In addition, several histone methyltransferases (HMTs) have been linked to telomere maintenance as well. Jezek and colleagues found that the Saccharomyces cerevisiae HMTs Set5 and Set1 are involved in telomere maintenance in yeast, and loss of these proteins leads to improper expression of genes adjacent to the telomeres (Jezek et al. 2017). Also, in yeast, Wu and colleagues found that H2BK123 mono-ubiquitination mediated by Rad6-Bre1 positively regulates telomere replication, leading to lengthened telomeres (Wu et al. 2017b). Thus, histone modifications play an important role in maintaining telomere function and genome stability.

As noted previously, telomeres contain a variety of histone modifications (Jezek and Green 2019). These modifications include heterochromatic marks that can exert their influence beyond the telomeric sequences and silence genes nearby through a process known as telomere position effect (TPE) (Robin et al. 2014). TPE is thought to contribute to the genomic instability that arises as telomeres shorten below their critical length (Robin et al. 2014). Recently, it has been suggested that telomeres also impact gene expression over longer distances, a phenomenon named telomere position effects over long distances (TPE-OLD) (Kim and Shay 2018). In TPE-OLD, telomeres cause heterochromatin spreading through chromatin looping, revealed in a 2013 study by Stadler and colleagues examining the regulation of human DUX4, a candidate gene for facioscapulohumeral muscular dystrophy (Stadler et al. 2013). TPE-OLD was confirmed through 3D-FISH (three-dimensional fluorescent in situ hybridization) that showed that chromosomal looping occurred between genes such as ISG15, DSP, and C1S and their respective telomeric ends (Robin et al. 2014). Interestingly, TPE-OLD also seems to affect TERT, the gene encoding telomerase. Kim and colleagues found that telomere length affected expression and nuclear location of the TERT gene (Kim et al. 2016). When telomeres are long, a chromatin loop formed in the hTERT locus, decreasing its expression. As telomeres shortened, for example with age, this loop disengaged and led to increased production of hTERT mRNA. Alongside that disengagement, DNA methylation and histone modifications changed in the hTERT promoter region, suggesting there is a mechanism for fine-tuning telomerase expression based on telomere length in humans (Kim et al. 2016). Thus, TPE-OLD represents another pathway by which telomeres contribute to genome stability utilizing epigenetic mechanisms to impact the expression levels of telomerase and other essential genes.

There are additional chromatin proteins in play that aid in protecting telomere length and function. Shelterin is a complex of chromatin proteins found in eukaryotes that is essential for the chromosome capping function of telomeres, preventing telomeres from fusing with each other (de Lange 2005). It allows telomeres to be distinguished from damaged DNA and specifically binds to chromosome ends, leading to the formation of a unique chromatin structure (de Lange 2005). Eukaryotic shelterin is composed of six core protein subunits: TRF1, TRF2, TIN2, Rap1, TPP1, and POT1 (Xin et al. 2008). Shelterin protects telomeric sites both by binding to the telomeric DNA and by remodeling the chromatin into a unique, tight nucleoprotein structure (Bandaria et al. 2016). TIN2 was the most important component for the shelterin-mediated chromatin compaction of these telomeric sites which prevented the binding of DNA damage response protein, illustrating the importance of chromatin structure at the telomere (Bandaria et al. 2016). Interestingly, the shelterin proteins do not function solely at the telomeres: Loss of TRF2 impairs replication fork progression through pericentromeric heterochromatin and destabilizes heterochromatin across the genome (Mendez-Bermudez et al. 2018), demonstrating that TRF2 has non-telomeric functions as well. Thus, the shelterin protein complex protects telomeres by generating a unique local chromatin structure, illustrating the importance of epigenetic mechanisms for the maintenance of genome stability.

Besides the shelterin proteins, other aspects of chromatin structure also impact the ability of the telomere to carry out its functions. Because the telomeric chromatin is typically considered heterochromatic, Chow and colleagues investigated the effect of increasing the level of the heterochromatin protein HP1α specifically at the telomeres (Chow et al. 2018). To achieve this goal, they fused HP1α with the shelterin protein TRF1, thus altering the telomere chromatin composition. Recruiting TRF1-HP1α fusion protein to the telomere increased heterochromatin formation, altered the 3D structure of the telomere, and prevented access by telomerase, suggesting both positive and negative impacts on telomere function (Chow et al. 2018). This study illustrates the importance of chromatin structure for telomere function and again links an epigenetic system with genome instability.

All in all, the data available today demonstrate that telomeres are unique chromatin structures maintained through epigenetic modifications, which serve to protect the ends of linear chromosomes, thus ensuring genome stability. Loss of telomere function or shortening of telomeres in healthy cells can lead to genome instability, and this loss is regulated by epigenetic pathways including the shelterin chromatin complex (de Lange 2005) and various histone modifications (Jezek and Green 2019). The telomere itself can contribute epigenetically to genome stability—or instability once shortened—through TPE (Robin et al. 2014; Stadler et al. 2013) and the TPE-OLD effect (Kim et al. 2016; Kim and Shay 2018). Interestingly, maintenance of telomere length in cancerous cells through epigenetic alterations leads to telomerase activity that lengthens and maintains telomere length in cancerous cells, aiding in the proliferation of these unstable cells (Jafri et al. 2016; Maciejowski and de Lange 2017). In conclusion, there are various epigenetic factors ensuring functional telomeres that can contribute to genome stability in complex ways.

Epigenetic mechanisms curb TE activity to prevent genome instability

TEs are another common source of genome instability (Klein and O'Neill 2018). TEs are mobile genetic elements that are classified based on the mechanism used for transposition: DNA transposons utilize a DNA intermediate, while RNA transposons or retrotransposons utilize an RNA intermediate (Mc 1950; Padeken et al. 2015). Together, various classes of TEs contribute large amounts of DNA to eukaryotic genomes. For example, TE-derived sequences make up approximately 50% of the human genome (Platt et al. 2018), they contribute approximately 20% to the Drosophila melanogaster genome (McCullers and Steiniger 2017), and in maize, an estimated 85% of the genome sequence is derived from TEs (Jiao et al. 2017; Schnable et al. 2009). While TEs are found throughout the genome, often they are concentrated in specific regions of genomes such as the centromeric and telomeric regions, where they contribute to the function of these domains (see above). Given the large number of TEs in eukaryotes, they are an important facet of these genomes, and epigenetic mechanisms control their activity to ensure genome stability.

Because TEs are able to move, either from location to location or by inserting additional copies of themselves at new locations, they can be a major source of genome instability. With every move, TEs potentially can disrupt genes or gene regulatory pathways. For example, in their review from 2012, Hancks and Kazazian identify 96 retrotransposon insertions that lead to genetic diseases in humans, including cases of hemophilia A and B and cystic fibrosis (Hancks and Kazazian 2012). Given the large number of TEs in many genomes, the mutation rate due to TE transposition can be magnitudes higher than the mutation rate due to mistakes during DNA replication. In addition, TE-encoded endonucleases can cause DNA double-strand breaks without TE insertions, thus potentially causing mutations due to imperfect break repair (Gasior et al. 2006). Finally, TEs contribute to genome instability by facilitating ectopic recombination/unequal crossovers between non-homologous sites due to the presence of identical sequences at various locations in the genome (Cordaux and Batzer 2009). These ectopic recombination events can lead to local deletions or duplications, chromosomal translocations, and inversions. These events have been reported in patients [as in a 2017 case of mesomelia-synostosis syndrome (Kohmoto et al. 2017)] as well as in model organisms. For example, in Drosophila melanogaster, when a DNA transposon-derived reporter gene (P element) is mobilized, local sequence duplications and deletions as well as more complex rearrangements result (Berg et al. 1980; Riddle et al. 2008; Sun et al. 2004). These examples illustrate how TEs contribute to genome instability and why they are targeted for silencing to keep their activity in check.

Given the challenge to genome stability provided by the large number of TEs present in typical eukaryotic genomes, it is not surprising that a variety of mechanisms are in place to curb TE activity (Klein and O'Neill 2018; Molaro and Malik 2016). Several of the TE silencing mechanisms are epigenetic in nature, and both DNA and histone modifications are involved in the transcriptional silencing of TEs, while a number of small RNA pathways are involved in the transcriptional and post-transcriptional silencing of TEs (Dumesic and Madhani 2014). In addition, there are several other pathways, including one linked to p53 (Levine et al. 2016; Tiwari et al. 2018) and one linked to the KAP1 and ZNF transcription factors (Friedli and Trono 2015) that have been implicated in TE silencing. This variety of parallel silencing mechanism hints at the importance of TE silencing for an organism’s survival.

DNA modifications are one level of epigenetic control involved in the control of TE activity (Deniz et al. 2019). DNA methylation in the form of cytosine methylation, in most eukaryotes, is considered a silencing mark, and high levels of cytosine methylation in promoter regions lead to gene silencing and the recruitment of additional silencing chromatin marks in the form of histone modifications (Jones 2012). The majority of TEs show high levels of cytosine methylation and are typically transcriptionally inactive other than in specific tissues where silencing mechanisms are suspended temporarily (Deniz et al. 2019). The importance of DNA methylation for TE silencing and genome stability is illustrated best by what happens if the DNA methylation pathway is impaired. In this case, TE activity, including TE expression levels as well as transpositions, increases. For example, in Arabidopsis thaliana, loss of the SWI2/SNF2 chromatin remodeling factor DDM1 (Jeddeloh et al. 1999), which is required for wildtype levels of cytosine methylation, leads to the reactivation of TEs such as the CAC family (Miura et al. 2001), with increased expression and transposition rate. In maize, with its more than 85% of the genome being derived from TEs, loss of its two DDM1 orthologs is lethal, suggesting that loss of control of a large number of TEs leads to fatal genome instability (Li et al. 2014). In mice with impaired function of the DNA methyltransferase DNMT1, hypomethylation leads to increased somatic transposition of the retroviral-like intracisternal A particle (IAP) (Howard et al. 2008). A 2018 study from Arabidopsis thaliana directly demonstrated that targeted demethylation of the CACTA1 transposon via a CRISPR/dCas9 derived TET1 fusion enzyme increased expression from this TE significantly (Gallego-Bartolome et al. 2018). Together, these studies demonstrate the importance of DNA modifications such as cytosine methylation in ensuring eukaryotic genome stability by silencing TEs.

Along with DNA methylation, histone modifications and chromatin structure, specifically heterochromatin, also contribute to the maintenance of genome stability by silencing TEs (Janssen et al. 2018). Many TEs are found in heterochromatic regions of the genome, typically the centromeres and telomeres, which are characterized by methylation of histone 3 lysine 9 (H3K9me2 and me3) and heterochromatin proteins such as those of the Heterochromatin Protein 1 (HP1) family (Janssen et al. 2018). The chromatin structure in heterochromatin is such that it often suppresses gene expression, leading to the silencing of TEs present in this genomic domain. Biochemical marks of heterochromatin are found also at many TEs within broader euchromatic regions of genomes, which again promote their transcriptional silencing. In Drosophila melanogaster, ~ 30% of euchromatic TEs are associated with silencing marks, and at many of these sites, heterochromatin spreads to neighboring sequences (Lee and Karpen 2017; Riddle et al. 2011). A 2018 screen in human cell lines revealed that transposition of LINE-1 (long interspersed element-1) was controlled in part by the human silencing hub (HUSH) complex, a silencing complex that functions through H3K9 methylation (Liu et al. 2018). The SETDB1 H3K9 methyltransferase was another hit in this screen (Liu et al. 2018), confirming the important role of histone modifications and particularly H3K9 methylation in the silencing of TEs. When He and colleagues screen 41 chromatin modifiers for their impact on TE expression, 29 of the modifiers impacted the expression levels of at least one TE class, and six modifiers, including SETDB1, impacted most TEs (He et al. 2019). Thus, the available data regarding histone modifications and TE activity suggest that repressive histone marks and heterochromatin formation are targeted purposefully to TEs to decrease their expression and ensure genome stability.

In addition, small RNA pathways play an important role in controlling TE activity, as these pathways are involved in targeting specific sequences for DNA modifications and/or heterochromatin formation (Dumesic et al. 2013) (Fig. 2). In plants, the RNA-directed DNA methylation (RdDM) pathway functions via small RNAs to target specific DNA sequences, including TEs, for modification by DNA methyltransferases (Cuerda-Gil and Slotkin 2016; Matzke and Mosher 2014). This pathway relies on a specialized Argonaute protein as well as two plant-specific homologs of RNA pol II (Pol IV and Pol V) to produce small RNAs (siRNAs) complementary to the sequences targeted for DNA methylation (Wendte and Pikaard 2017; Zhou and Law 2015). Defects in this pathway, which has several subtypes, lead to altered chromatin structure and loss of TE silencing (Cuerda-Gil and Slotkin 2016). For example, in Arabidopsis thaliana, HDA6 and PolIV/V cooperatively silence TEs, and the loss of either component leads to TE reactivation, albeit with slightly different effects (Blevins et al. 2014). In animals, piRNAs are the main non-coding RNA type involved in TE control (for a review of their role in mammals, see Ernst et al. (2017); Fig. 2b). The main function of piRNAs is in the germline, where several downstream pathways are important for the TE silencing. piRNAs are derived from long transcripts of so-called piRNA cluster, which can be considered “graveyards” of TEs. The piRNAs then interact with a Piwi-class Argonaute protein, and they are reported to mediate both transcriptional and post-transcriptional silencing of TEs (Czech et al. 2018; Ernst et al. 2017; Hirakata and Siomi 2019; Huang et al. 2017; Ozata et al. 2019; Sentmanat et al. 2013). Failure of the piRNA pathway to silence TEs is illustrated well by the phenomenon of hybrid dysgenesis in Drosophila (Castro and Carareto 2004; Malone et al. 2015). If a strain with the P element (P strain), a DNA transposon which invaded wild Drosophila after the collection of most laboratory strains, is crossed to a laboratory strain lacking P elements (M strain), the P elements are activated in the F1 germline, and the animals exhibit various defects including sterility (Kidwell et al. 1977). This sterility can be prevented by the presence of piRNAs to the TE in the F1 germline, which occurs naturally in the reciprocal cross, which does not lead to dysgenesis (Brennecke et al. 2008). Mutants in the piRNA pathways generally lead to sterility and problems in the germline, and in many aspects, these animals resemble dysgenic animals (Kelleher et al. 2012). Together, the experiments investigating small RNA pathways in animals and plants illustrate the importance of these pathways for the formation of silent chromatin, the control of TEs, and genome stability.

Fig. 2
figure 2

Examples of small RNA-mediated TE silencing. a RdDM pathway in Arabidopsis thaliana. RNA Polymerase IV (Pol IV) produces a single-stranded transcript from a transposon (i). RNA-dependent RNA Polymerase 2 (RDR2) uses this single-stranded RNA as a template to produce a double-stranded RNA (ii). This double-stranded RNA is diced into 24-nt small interfering RNAs (siRNAs) by Dicer-like 3 (DCL3) (iii). AGO4 binds the 24-nt siRNA. Next, this complex binds to matching Pol V-generated transcripts, still associated with their transcription unit (iv). This binding recruits the domains of rearranged methyltransferase 2 (DRM2), which produces cytosine methylation and silences the targeted transposon. Pol IV continues to produce transposon transcripts at a low level, feeding back into the siRNA producing pathway (v). b The piRNA pathway in Drosophila melanogaster. A piRNA precursor is transcribed from a piRNA cluster containing TE sequences and exported from the nucleus to the cytoplasm (i). The piRNA precursor then is processed into small, ~ 24 bp-length piRNAs by the Zucchini protein (Zuc, red) at the mitochondria (ii), which can be bound to PIWI and translocated into the nucleus to transcriptionally silence homologous TEs (iii). Alternatively, piRNA precursor transcripts also can be processed by the ping-pong cycle (v). In the ping-pong amplification cycle, a mature piRNA bound to the argonaute protein Aubergine (AUB) (iv) directs slicing of a matching transposon transcript bound to AGO3, which will be processed into a mature piRNA. The newly formed AGO3-piRNA complex can then slice additional matching sequences, either precursor piRNAs or transposon transcripts, which will bind to AUB (v). Continuation of this cycle leads to amplification of piRNAs, which can either be utilized in the ping-pong cycle to post-transcriptionally degrade transposon transcripts or bound to PIWI and translocated into the nucleus to transcriptionally silence homologous TEs (III)

In summary, while having some beneficial functions at the centromere, telomere, and in generating evolutionary novelty, TEs are a major source of genome instability due to their ability to jump from locus to locus and due to the consequences of the presence of numerous identical sequences at non-homologous locations for other nuclear processes (DNA repair, replication, etc.). Thus, epigenetic processes working through DNA modifications, histone modifications, chromatin structure, and small RNAs are essential to promoting genome stability by ensuring the suppression of TE expression and transposition. Failure of these pathways to suppress TEs has severe consequences, including sterility and even death in species with high levels of TEs, highlighting the importance of the epigenetic processes involved for genome stability.

Epigenetic modifications in DNA repair pathways impact genome stability

As noted in the section on TEs, DNA repair is an important aspect of genome stability (Lombard et al. 2005). The genome is experiencing constant assault from both internal sources such as TEs and from external factors such as chemical mutagens, UV radiation, and more (Chatterjee and Walker 2017). Different DNA damage repair pathways exist to deal with different types of damage. One of the most common forms of DNA damage are DNA double-stranded breaks (DSB), such as the breaks introduced by the transposition of TEs (for example, see Liu and Wessler 2017). DSBs can be repaired through mechanisms such as non-homologous end joining (NHEJ) and homologous recombination (HR), which uses homologous sequences as a repair template (Brandsma and van Gent 2012). Abnormal bases or pyrimidine dimers created for example by UV radiation are corrected through the nucleotide excision repair (NER) or base excision repair (BER) pathways (Melis et al. 2013). The DNA mismatch repair pathway typically corrects mistakes introduced during DNA replication, and direct reversal repair pathways are available for some altered bases (Chatterjee and Walker 2017). In many cancers, increased genome instability is seen, which has been traced back to the inactivation of genes involved in various DNA repair pathways (Lahtz and Pfeifer 2011). This finding illustrates that DNA repair pathways are crucial for maintaining genomic stability by preventing DNA damage from accumulating and negatively impacting cell functions.

Of the four core histones, H2A has the largest number of variants studied to date, and one of these variants, H2AX is linked to DNA repair and genome stability (Georgoulis et al. 2017). In mammalian cells, H2AX is a histone variant of H2A that is rapidly phosphorylated with high specificity in response to DSBs, yielding γ-H2AX (Firsanov et al. 2011; Kinner et al. 2008; Rogakou et al. 1998). Phosphorylation is removed from H2AX after the DSB has been repaired and chromatin integrity has been restored. Celeste and colleagues investigated cells with reduced H2AX levels and found that H2AX is not crucial for initial recognition of the DSBs; however, cells without H2AX were less effective at recruiting DNA damage response proteins later in the repair process (Celeste et al. 2003). Ultimately, γ-H2AX’s function appears to be to modify the chromatin structure, to increase DNA accessibility, and to recruit DNA damage response proteins to the appropriate regions in the genome (Celeste et al. 2002). Consequently, γ-H2AX has become a noteworthy molecular marker for DNA damage and is a hallmark of genomic instability. Thus, γ-H2AX might serve as a prognostic biomarker for various cancers, such as breast (Nagelkerke et al. 2011; Varvara et al. 2019; Wang et al. 2019), lung (Chatzimichail et al. 2014; Matthaios et al. 2012; Ochola et al. 2019), and bladder cancers (Fernández et al. 2013). Together, the available data demonstrate that γ-H2AX is critical for genome integrity due to its role in the DNA DSB repair pathway, highlighting another route by which epigenetics contributes to genome stability.

An additional group of epigenetic regulators of DNA repair pathways is represented by the sirtuin protein family. Sirtuins are highly conserved proteins with histone deacetylase functions, occurring in species from bacteria to yeast to humans (Greiss and Gartner 2009; Imai et al. 2000). Sir2 from budding yeast was the first sirtuin to be described, and in this species, it is essential for gene silencing in heterochromatin regions including the telomeres (Aparicio et al. 1991; Rine and Herskowitz 1987; Wierman and Smith 2014). Sirtuins have been linked to a variety of biological processes, including aging, transcription, and stress response, and it is often their role in DNA repair mechanisms that influences these processes (Choi and Mostoslavsky 2014). Among the mammalian sirtuins, SIRT1 is the most well studied, and SIRT1 has been linked to DNA repair pathways including the HR DNA repair pathway, the NHEJ DNA repair pathway, and the NER pathway (Fan and Luo 2010; Li et al. 2008; Yamamori et al. 2010). Lin and colleagues recently found that SIRT1 is involved in the choice between the HR and NHEJ repair pathways for DNA DSBs (Lin et al. 2015). They demonstrated that KAP1, a transcriptional corepressor, inhibits HR- and promotes NHEJ-mediated repair of DSBs, but that this process depends on the KAP1 acetylation state. Deacetylation of KAP1 is regulated by SIRT1, and it is this deacetylation that promotes NHEJ-mediated repair (Lin et al. 2015). In addition to KAP1, SIRT1 also deacetylates KU70, a DNA damage repair protein that is essential to the NHEJ pathway (Fell and Schild-Poulter 2012). Zhang et al. found that SIRT1 enhances NHEJ activity in chronic myeloid leukemia (CML) cells, likely through its role in deacetylating KU70 (Zhang et al. 2016). In addition, Roth and colleagues found that SIRT1 and the LSD1 lysine demethylase competitively bind to KU70 in cancer cells responding to stress, having different effects on NHEJ repair efficiency (Roth et al. 2016). Thus, SIRT1 is an example of an epigenetic modifier which impacts the NHEJ repair pathway, thus contributing to the control of genome stability.

The roles of histone variants like H2AX and epigenetic modifiers like SIRT1 and LSD1 in DNA repair pathways illustrate another important link between epigenetics and genome stability. These examples show that DNA repair pathways utilize a variety of epigenetic mechanisms combating DNA damage to ensure genome stability, preventing aging and cancer. All in all, these are just a few of the epigenetic mechanisms of note that impact DNA repair pathways and genomic stability.

Conclusions and outlook

As illustrated in this review, various epigenetic mechanisms are at work in maintaining genome stability. Epigenetic systems regulate centromere organization to ensure proper chromosome segregation during cell division, and their loss can upset genomic integrity and lead to chromosomal abnormalities and cell death (Barra and Fachinetti 2018). Without the correct epigenetic mediators, telomeres can shorten, or lose their ability to protect chromosome ends (Booth and Brunet 2016; Maciejowski and de Lange 2017; O'Sullivan and Karlseder 2010). Without the epigenetic mechanisms controlling their activity, TEs can cause genomic instability by disrupting essential genes, introducing DNA DSBs, and promoting errors during DNA replication, DNA repair, and homologous recombination (Klein and O'Neill 2018). Epigenetic mechanisms also are essential regulators of the DNA repair pathways that are core to maintaining genome stability in the face of DNA damage, and likely contribute to other pathways contributing to genome stability as well (Dabin et al. 2016). Thus, epigenetic mechanisms are an integral part of the complex web of pathways used to ensure genome stability.

While the basic fact of the involvement of epigenetic systems in the regulation of genome stability is well-established, details for many pathways are still lacking and are areas of active research. Epigenetic mechanisms likely will play many additional roles in the gene regulatory pathways that control the expression of the large variety of proteins and transcripts that contribute to genome stability. In addition, there is an interesting but currently limited body of research looking into the question of how the epigenome, in particular, chromatin structure influences the mutational landscape. These studies indicate that mutation rates differ between regions of the genome with well-positioned nucleosomes and regions that are nucleosome-free, with nucleosomal DNA surprisingly showing higher levels of substitutions and regions of DNA typically found in the linker between nucleosomes (Warnecke et al. 2008; Washietl et al. 2008). Additionally, there is some data suggesting that not all nucleosomes impact substitution rates equally and that the histone modifications present on a particular nucleosome might affect mutation rates (Prendergast and Semple 2011; Tolstorukov et al. 2011). The question of how chromatin structure impacts the mutations that occur and the efficiency of the DNA repair pathways is highly relevant and ripe for further exploration.

In addition, studies using a comparative approach are likely to shed more light onto the connections between epigenetic mechanisms and genome stability. The progress in sequencing technologies and the drop in cost have made genomics and epigenomics studies possible in any species of interest. The resulting increase in datasets from a large variety of species has led, for example, to the realization that not all species use CenH3 to determine centromere identity (Akiyoshi and Gull 2014; Berriman et al. 2005; Drinnenberg et al. 2014) and raises the question of how the centromere works in these species. Experiences like this one suggest that there are many more novel links between epigenetics and genome stability awaiting their discovery. Especially histone variants, telomere structure, and centromeres are promising areas of further research, because the variation present among well-characterized species suggests that more is likely to be found once a wider range of species is considered.