Keywords

1 Introduction

Methylation of CpG sites in the DNA plays a vital role in mammalian development and has been studied extensively in the past decades. However, despite advances in understanding of the targeting and regulation of DNA methyltransferases in cells, the specific processes contributing to the generation, maintenance and erasure of DNA methylation patterns are not yet fully elucidated, and the exact molecular mechanisms leading to the aberrant methylation observed in human disease (like cancer) are only partially understood. The discovery of TET enzymes has changed the view on DNA methylation as a very stable modification, as it showed that active DNA demethylation can occur through stepwise oxidation of 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and finally to 5-carboxylcytosine (5caC), followed by the removal of the oxidised bases by Thymine DNA glycosylase (TDG) and base excision repair mechanism. However, while the biological function of TET enzymes has been studied quite extensively, very little is known about their biochemical properties and their specificity, catalytic mechanism, as well as the contribution of different domains to enzymes targeting and regulation. In this chapter, we summarise the most important properties of both DNA methyltransferases and TET enzymes and describe some of the molecular pathways leading to their recruitment to the target sites. Finally, we introduce the concept of epigenetic editing as an elegant approach to dissect the function of DNA methylation and demethylation in a locus specific manner.

2 Setting of DNA Methylation

Since the discovery of methylated DNA bases in 1948, major advances have been made in our understanding of the biological role of DNA methylation, as well as the mechanisms regulating the function of DNA methyltransferases in cells. Through these discoveries, DNA methyltransferases emerged as key epigenetic enzymes regulating mammalian development and cellular specialisation, which is clearly emphasised by the lethal phenotypes of the genetic knockouts of any of the DNA methyltransferase enzymes in mice and by the ever-growing number of diseases showing disturbed DNA methylation signatures.

DNA methylation in mammals occurs at the C5 position of the cytosine residues, primairly in the CpG sites, although non-CpG methylation is also present (albeit at lower levels). About 60–80% of the CpG sites in the human genome (corresponding to around 3–4% of all cytosines) are methylated in a tissue and cell type-specific pattern [reviewed in Schubeler (2015), Jurkowska et al. (2011a), Gowher and Jeltsch (2018), Ravichandran et al. (2018)]. Additionally, 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), which arise from the step-wise oxidation of the methyl group of 5-methylcytosine, have recently been discovered in mammalian genome DNA (Fig. 1) (Tahiliani et al. 2009; He et al. 2011; Ito et al. 2011). Three methyltranferase enzymes are responsible for the generation and maintenance of the global DNA methylation patterns in humans: DNMT3A, DNMT3B and DNMT1. DNMT3A and DNMT3B introduce DNA methylation during mammalian development and maturation of germ cells, with the assistance of a regulatory factor DNMT3L. After their establishment, DNA methylation patterns are largely preserved, with only small tissue-specific changes, but can get significantly altered in disease. During DNA replication, unmethylated DNA strands are synthetised, leading to the conversion of fully methylated CpG sites into hemi-methylated sites that are then re-methylated by a maintenance methyltransferase DNMT1, which has high preference towards hemimethylated DNA (Fig. 1), and is ubiquitously and highly expressed in proliferating cells. This elegant inheritance mechanism enables DNA methylation function as a key epigenetic mark mediating long-term transcriptional silencing. In this respect, DNA methylation is involved in silencing of repetitive elements, genomic imprinting, X-chromosome inactivation and regulation of gene expression during development and cellular specialisation [reviewed in Smith and Meissner (2013), Bogdanovic and Lister (2017)]. DNA methylation can be lost by either passive mechanism, when maintenance MTase activity is absent, or via an active demethylation process (see below). Considering its important biological roles, it is not surprising that aberrant DNA methylation changes play a prominent role in the development of human diseases, including for example haematological cancer (Bergman and Cedar 2013; Yang et al. 2015).

Fig. 1
figure 1

The methylation cycle in mammals. DNA methylation is set on unmethlyted cytosines by the combined action of DNMT3A and DNMT3B (blue) and maintained during replication by DNMT1, which has high preference for hemimethylated CpG sites. DNA methylation can be lost by a passive mechanism, getting diluted after consecutive cycles of DNA replication or through an active mechanism, involving oxidation of 5mC to 5hmC, 5fC and 5caC by TET enzymes (red). 5fC and 5caC can be recognised and excised by TDG and base excision repair enzymes, leading to the restoration of an unmethylated state (pink)

2.1 Architecture of DNA Methyltransferases

In the structure of the mammalian DNA methyltransferases, two functional parts can be identified, a large N-terminal part and a smaller C-terminal catalytic part (Fig. 2). The N-terminal part of DNMTs contains several distinct domains with targeting and regulatory functions. The C-terminal domain contains ten catalytic amino acid motifs conserved among prokaryotic and eukaryotic C5-DNA methyltransferases and folds into a conserved structure called AdoMet-dependent MTase fold, which consists of a mixed seven-stranded beta sheet, formed by six parallel beta strands and a seventh strand in an antiparallel orientation, inserted into the sheet between strands 5 and 6. Six helices are folded around the central beta sheet (Cheng and Blumenthal 2008). This domain is involved in binding of the cofactor S-adenosyl-L-methionine (AdoMet), recognition of the DNA substrate and in catalysis. Interestingly, the spatial arrangement of the various domains in DNMTs plays a prominent role in the regulation of enzymes’ activity and specificity through allosteric control of the catalytic domain [reviewed in Jeltsch and Jurkowska (2016)].

Fig. 2
figure 2

Domain composition of mammalian DNA methyltransferases and TET enzymes [adapted from Ravichandran et al. (2018)]. (a) The N-terminal regulatory part of the enzymes contains various chromatin and protein interacting domains, in the C-terminal domain of DNA methyltranferases, the catalytic motifs are indicated. For details, refer to text. (b) Crystal structures of the human DNMT1 (351–1600 fragment), the human DNMT3A and DNMT3L complex bound to the histone H3 tail and the human TET2 catalytic domain in complex with DNA. Distinct protein domains are color coded according to the color scheme used in (a)

2.1.1 Domain Composition of DNMT1

DNMT1 was the first mammalian DNA methyltransferase enzyme to be cloned and biochemically characterised. It is a large protein (1616 aa in humans), containing several distinct domains, which are listed below from the N-terminus to the C-terminus of the protein (Jurkowska and Jeltsch 2016) (Fig. 2):

  • DMAPD (DNA methyltransferase-associated protein 1 interaction domain) is involved in the targeting of Dnmt1 to replication foci.

  • PBD (PCNA—proliferating cell nuclear antigen—binding domain) recruits DNMT1 to the replication fork during S phase via interaction with PCNA.

  • RFTD (replication foci-targeting domain) is involved in the targeting of DNMT1 to replication foci and to centromeric chromatin.

  • CXXC domain binds unmethylated DNA and might be involved in the specificity of DNMT1

  • BAH1 and BAH2 (bromo-adjacent homology 1 and 2) domains are necessary for the folding of the enzyme, but their exact biological function is unknown

The catalytic domain of DNMT1 is inactive in an isolated form despite presence of all conserved methyltransferase motifs required for catalysis, demonstrating that it is controlled by the N-terminal domain of the enzyme. Indeed, structural and biochemical studies confirmed that several domains in the N-terminal part of DNMT1 directly contact the catalytic domain, providing examples of the sophisticated allosteric regulation of DNMT1 [reviewed in Jeltsch and Jurkowska (2016)].

2.1.2 Domain Composition of DNMT3 Family

Human DNMT3 family comprises three members: DNMT3A (912 aa), DNMT3B (853 aa) and DNMT3L (387 aa). DNMT3A and DNMT3B are enzymatically active, whereas DNMT3L does not possess methyltransferase activity, but it stimulates the activity of DNMT3A and DNMT3B (Gowher et al. 2005; Chedin et al. 2002). Of note, a novel member of the rodent Dnmt3 family, Dnmt3c that arose from duplication of the DNMT3B gene, has been identified recently (Barau et al. 2016). This male germline-specific variant is required for methylation of retrotransposons during mouse spermatogenesis (Barau et al. 2016). However, its orthologue has not been identified in humans.

In the N-terminal part of the DNMT3 proteins, which differ significantly from the N-terminal part of DNMT1, three separate regions can be distinguished (Jurkowska et al. 2011a) (Fig. 2):

  • very N-terminal segment of DNMT3A and DNMT3B, which is the most variable region between both proteins, binds DNA and is important for anchoring of the enzymes to nucleosomes. It seems to also play a role in the targeting of DNMT3A to the shores of bivalent CpG promoters.

  • ADD (ATRX-DNMT3-DNMT3L) domain of DNMT3 proteins mediates the interaction with the N-terminal tails of histone H3, as well as other chromatin proteins; and is involved in the allosteric regulation of the enzymes’ activity.

  • PWWP domain of DNMT3A and DNMT3B interacts with histone H3 tails trimethylated at lysine 36 and is essential for targeting of the enzymes to pericentromeric chromatin and gene bodies. This domain is missing in DNMT3L.

The catalytic domains of DNMT3A, and DNMT3B share ~80% sequence identity and are active in isolated form (Gowher and Jeltsch 2002). In contrast, despite clear homology with the other family members, the C-terminal domain of DNMT3L is catalytically inactive due to amino acid exchanges and deletions within the conserved methyltransferase motifs. Structural studies of the DNMT3A and DNMT3L revealed that both proteins form a tetrameric complex, consisting of two molecules of DNMT3A in the centre and two molecules of DNMT3L at the edges of the tetramer (Jia et al. 2007; Guo et al. 2015; Zhang et al. 2018). These interfaces also support self-interaction of DNMT3A and contribute an interesting regulatory mechanism for the activity and localisation of DNMT3A [reviewed in Jeltsch and Jurkowska (2013)]. The arrangement of the two DNMT3A catalytic sites allows methylation of two adjacent CpG sites in one binding event (Jia et al. 2007; Jurkowska et al. 2008; Zhang et al. 2018). The long-awaited structure of the DNA-bound form of the complex revealed that the DNA binding interface of DNMT3A is formed by a specific loop from the target recognition domain, the catalytic loop and the homodimeric interface of DNMT3A (Zhang et al. 2018).

2.2 Catalytic Properties of DNMTs

All DNA cytosine-C5-methyltransferases share a similar catalytic mechanism, involving conserved amino acid motifs, for the transfer of the methyl group from the cofactor AdoMed to the target cytosine base [reviewed in Jurkowska et al. (2011a)]. Interestingly, they use base flipping to rotate the target cytosine out of the DNA duplex and insert it in the catalytic pocket. As in the case of other DNMTs, base flipping was observed in the structures of DNMT1 and DNMT3A with substrate DNA as well (Song et al. 2012; Zhang et al. 2018).

DNMT1 is a very processive enzyme, capable of methylating multiple CpG sites along the DNA without dissociating from the substrate (Vilkaitis et al. 2005; Hermann et al. 2004). This property fits well to the maintenance role of DNMT1 at the replication fork, as it allows very efficient methylation of the newly synthetized daughter strand before the chromatin is reassembled. The structure of DNMT1 revealed that the enzyme enwraps the DNA, enabling sliding of the protein along the substrate and catalysis of successive methylation reactions (Song et al. 2012).

In contrast to DNMT1, DNMT3A methylates DNA in a distributive manner (Norvil et al. 2018; Gowher and Jeltsch 2002), requiring enzyme dissociation after each round of methylation. In addition, it cooperatively binds to DNA, forming multimeric protein-DNA filaments reviewed in Jeltsch and Jurkowska (2013). Cooperative binding enables methylation of multiple sites on the same DNA molecule and increases DNMT3A activity (Emperle et al. 2014), leading to an efficient spreading of DNA methylation over a larger region (Stepper et al. 2017). Interestingly, binding of several DNMT3A along the DNA leads to an 8-10 bps periodicity in the methylation pattern (Jia et al. 2007; Jurkowska et al. 2008). In contrast to DNMT3A, DNMT3B is able to processively methylate multiple CpG sites and binds to the DNA in a non-cooperative manner (Norvil et al. 2018; Gowher and Jeltsch 2002), indicating that small sequence differences in the catalytic domains of DNMT3A and DNMT3B have a profound impact on the catalytic properties of these related enzymes.

2.3 Intrinsic DNA Sequence Specificity of DNMTs

For a long time, DNA methylation in mammals was thought to be largely restricted to CpG sites, however, recent studies revealed the presence of non-CpG methylation in several cell types and tissues, both in mouse and in humans [reviewed in He and Ecker (2015)]. The original DNA methylation pattern is set by DNMT3A and DNMT3B, which are classically designated as de novo MTases, as they do not display preference between unmethylated and hemi-methylated DNA. Although both enzymes preferentially methylate CpG dinucleotides, they can also modify cytosines in a non-CpG context, with a preference for CA >> CT > CC (Gowher and Jeltsch 2001; Zhang et al. 2018). Experiments using DNMT3s knockout in embryonic stem cells or ectopic expression of DNMT3A in cells lacking DNA methylation provided direct evidence that DNMT3 enzymes introduce methylation in non-CpG context also in vivo (Ramsahoye et al. 2000). Conversely, DNMT1 cannot efficiently methylate non-CpG sites, leading to the loss of the non-CpG methylation through cellular division in the absence of DNMT3 enzymes. Therefore, presence of the non-CpG methylation directly reflects DNMT3 enzyme activity in cells. Consistently, methylated non-CpG sites are widespread in cells and tissues, where DNMT3A and DNMT3B are highly expressed (like embryonic stem cells, induced pluripotent cells, oocytes and brain), but absent or present at only marginal levels in most somatic tissues and cells with low expression of these enzymes (Ziller et al. 2011; Varley et al. 2013).

Currently, it is unclear what is the biological role of the non-CpG methylation. It has been viewed as a by-product of the hyperactivity and low specificity of DNMT3 enzymes. However, depending on the experimental system, there is also evidence of its potential role in gene repression or expression. Most insights about the potential biological role of non-CpG methylation came from studies on brain [reviewed in Kinde et al. (2015), Jang et al. (2017)], where non-CpG methylation occurs at high levels and contributes to neuronal maturation and specification of brain cells. However, further studies are required to elucidate the exact biological function of the non-CpG methylation.

Although DNMT3A and DNMT3B do not seem to have strong sequence specificity beyond CpG dinucleotides, both enzymes are sensitive to the sequences flanking their target cytosines. For example, DNMT3A prefers purine bases at the 5′ end of the CpG, whereas pyrimidines are favoured at their 3′ end (Handa and Jeltsch 2005; Jurkowska et al. 2011b). Interestingly, experimental flanking sequence preferences of DNMT3s correlate with the methylation level of CpG sites found in the human genome (Handa and Jeltsch 2005), suggesting that the inherent sequence preferences of de novo methyltransferases contribute to the selection of their target regions in the genome.

The mechanistic understanding of the flanking sequence preferences and specificity of DNMT3 enzymes towards CpG sites has long awaited the availability of the structure with bound substrate DNA, which has been obtained recently (Zhang et al. 2018). The structure of the DNMT3A-DNMT3L complex bound to DNA revealed that the guanine base of the target CpG site is accurately recognised by the R836 residue of DNMT3A, mutation of which results in a reduced preference of the enzyme for CpG methylation (Gowher et al. 2006; Zhang et al. 2018). Additionally, several residues directly contact the bases flanking the CpG dinucleotide, explaining the strong flanking sequence preferences of DNMT3A. Notably, in the structure, no protein contact with the cytosine base of the opposite strand was observed, explaining the lack of discrimination of DNMT3 enzymes between unmethylated and hemi-methylated DNA (Zhang et al. 2018).

In contrast to DNMT3 enzymes, DNMT1 shows strong preference towards hemi-methylated DNA over unmethylated substrate (Song et al. 2012; Goyal et al. 2006; Bashtrykov et al. 2012), which enables its function as the methylation copy machine at the replication fork. The structure of DNMT1 bound to hemi-methylated DNA provided molecular explanation for this preference and revealed that the methyl group of the cytosine is recognised by a hydrophobic pocket in the catalytic domain of DNMT1 and that both the 5mC and the corresponding G in the target DNA strand are recognized accurately (Song et al. 2012). This observation also explains the high specificity of DNMT1 towards CpG sites over non-CpG sites mentioned above.

2.4 Recruitment of DNMT Enzymes to Chromatin and Replicating DNA

Correct establishment and maintenance of DNA methylation patterns is crucial for human development and health, therefore mechanisms contributing to these processes have been extensively studied over the past decades. Several synergistic models, including both the inherent specificity of the methyltransferases, as well as the role of other proteins and chromatin modifications, have been proposed to explain how specific DNA methylation patterns are established [reviewed in Jurkowska et al. (2011a)].

2.4.1 Interaction of DNMT3s with Chromatin Marks

As DNA methylation is embedded in multifaced epigenetic network, direct interaction with specific chromatin marks has been proposed as a general mechanism involved in the recruitment of DNA methyltransferases to specific genomic regions.

The ADD domain, which is present in all DNMT3 proteins, interacts specifically with H3 tails unmethylated at lysine 4, modification (e.g. acetylation or di/trimethylation) of which prevents ADD binding (Noh et al. 2015; Otani et al. 2009; Zhang et al. 2010b). Importantly, binding of H3 tails to the ADD domain also allosterically activates DNMT3A (Guo et al. 2015; Li et al. 2011), thereby stimulating methylation of chromatin-bound DNA by DNMT3A. Since (tri)methylation of H3K4 is associated with active genes, its presence would repel DNA methyltransferases and prevent DNA methylation of active regions. Indeed, a strong inverse genome-wide correlation of DNA methylation and H3K4me3 modification was observed (Hodges et al. 2009; Meissner et al. 2008) and demethylation of K4 of H3 at enhancers of pluripotency genes was required for localization of DNMT3 enzymes in embryonic stem cells (Petell et al. 2016). The crucial role of the ADD domain in the targeting of DNMT3A to chromatin in vivo was further confirmed by an elegant study, which showed that engineering of the ADD domain of DNMT3A led to aberrant DNA methylation patterns in cells and disturbed differentiation programs of embryonic cells (Noh et al. 2015). Besides interacting with histone H3 tails, the ADD domain is a platform involved in DNMT3A interaction with other proteins, including transcription factors, histone methyltransferases and other chromatin proteins [reviewed in Ravichandran et al. (2018)]. Importantly, as ADD domain is involved in the allosteric regulation of DNMT3A, interaction with this domain may directly influence DNMT3A activity, as shown for H3 (Guo et al. 2015) and MeCP2 (Rajavelu et al. 2018).

The PWWP domain of DNMT3A and DNMT3B, which specifically recognizes H3 tails tri-methylated at K36 (H3K36me3) (Dhayalan et al. 2010), is the second DNMT3 domain directly contributing to the recruitment of methyltransferases to specific genomic regions, including pericentromeric chromatin and gene bodies. Strong correlation of both H3K36me3 and DNA methylation was observed in the body of active genes and at exon-intron boundaries (Vakoc et al. 2006; Kolasinska-Zwierz et al. 2009; Baubec et al. 2015). The central role of H3K36me3 recognition in targeting of DNA methylation was experimentally confirmed in a variety of cellular systems (Neri et al. 2017; Morselli et al. 2015; Baubec et al. 2015). For example, H3K36me3-dependent intragenic DNA methylation by DNMT3B is crucial to protect gene bodies from cryptic transcription initiation (Neri et al. 2017). Furthermore, a subset of heterochromatic repeats shows strong enrichment in H3K36me3, explaining the role of the DNMT3A PWWP domain in the heterochromatic localization of the enzyme (Ernst et al. 2011).

Besides interacting with histone tails, the PWWP domain of DNMT3 enzymes can also bind DNA (Qiu et al. 2002). A recent model for the methylation of nucleosomal DNA by DNMT3A suggested that the targeting occurs through a specific binding of H3K36me3 by the PWWP domain of DNMT3A, which is followed by an activation of the catalytic domain mediated by the binding of H3 tails to the ADD domain, resulting in the methylation of nearby cytosines by the catalytic domain (Rondelet et al. 2016).

2.4.2 Recruitment of DNMT1 to Replicating Chromatin

Several targeting mechanisms ensure the correct localization of DNMT1 to replicating DNA. First one involves PCNA, a component of the replication machinery that interacts and co-localizes with DNMT1 in vivo (Iida et al. 2002), indicating that it might directly recruit the methyltransferase to the replication fork and load it onto DNA. The PCNA-DNMT1 interaction contributes to the efficiency of DNA re- methylation in cells, but it is not essential for this process (Egger et al. 2006). Second factor essential for the recruitment of DNMT1 and the maintenance of DNA methylation patterns in mammals is UHRF1 (Sharif et al. 2007; Bostick et al. 2007). UHRF1 specifically binds to hemi-methylated DNA via its SET and RING-associated (SRA) domain (Hashimoto et al. 2008; Bostick et al. 2007; Arita et al. 2008) and recognizes histone H3 tails methylated at lysine 9 (H3K9me2/me3) via cooperative binding of its tandem Tudor domain (TTD) and its plant homeodomain (PHD) (Rothbart et al. 2012; Nady et al. 2011). The chromatin interactions of UHRF1 are necessary for the recruitment of DNMT1 to replicating chromatin, since UHRF1 mutations preventing histone binding abolished DNA methylation by DNMT1 in cells (Rothbart et al. 2012; Nady et al. 2011). Similarly, UHRF1 knockout in mice results in a genome-wide loss of DNA methylation (Bostick et al. 2007; Sharif et al. 2007). In addition to its role in targeting of DNMT1, UHRF1 was also shown to stimulate the catalytic activity of DNMT1 through a direct interaction (Bashtrykov et al. 2014).

A model of a direct recruitment of DNMT1 by histone marks is also plausible, as the methyltransferase preferentially associates with H3 tails ubiquitinated at K18 and K23 (Qin et al. 2015; Nishiyama et al. 2013). This interaction is mediated by the replication foci-targeting (RFTS) domain of DNMT1 and leads to the recruitment of the enzyme to newly replicated DNA and its simultaneous activation, providing another beautiful example of allosteric regulation of DNMTs. The ubiquitination of the H3 tail is introduced by UHRF1 and is stimulated UHFR1 binding to hemi-methylated DNA (Harrison et al. 2016). Ubiquitinated H3 accumulates during S-phase, leading to the recruitment of DNMT1 to newly replicated DNA (Qin et al. 2015; Nishiyama et al. 2013; Harrison et al. 2016). These data indicate an important additional connection between DNMT1 and UHRF1 chromatin interactions, which is essential for an efficient maintenance of DNA methylation.

3 Erasure of DNA Methylation

Since its discovery, 5mC was considered a very stable modification due to the chemical strength of the C-C bond. Therefore, DNA demethylation was expected only to occur passively, through a replication-dependent dilution in the absence or inhibition of the maintenance methylation machinery (Fig. 1). However, global genome-wide loss of DNA methylation occurring in a DNA replication-independent manner was observed in mouse zygotes (Mayer et al. 2000; Oswald et al. 2000) and during specification of primordial germ cells (Hajkova et al. 2002; Yamazaki et al. 2003), pointing towards existence of an active demethylation machinery. Furthermore, active DNA demethylation has also been observed at specific loci in T cells, neurons and other cells (Bruniquel and Schwartz 2003; Martinowich et al. 2003).

Despite the discovery of biological processes where active DNA demethylation occurs in the absence of DNA replication, the enzymatic machinery responsible for this process in mammals remained enigmatic until 2009, when a group of enzymes called Ten-Eleven Translocation (TET) was shown to oxidize 5mC to 5-hydroxymethylcytosine (5hmC) both in vitro and in mouse embryonic stem cells (Tahiliani et al. 2009). Moreover, TET enzymes are able to further oxidize 5hmC to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (He et al. 2011; Ito et al. 2011). These oxidized bases are recognized and excised by the thymine DNA glycosylase (TDG) triggering base excision repair pathway (BER) to replace the abasic site by an unmodified cytosine (He et al. 2011; Maiti and Drohat 2011), thereby completing the DNA demethylation cycle (Fig. 1). This finding suggested a plausible pathway for active DNA demethylation and opened a new dynamic field of research (Tahiliani et al. 2009).

3.1 Architecture of TET Enzymes

The mammalian TET family comprises three paralogous members (TET1, 2136 aa in human, TET2, 2002 aa and TET3, 1776 aa), which share similar domain architecture (Fig. 2). They all are big proteins harboring a large, mostly unstructured N-terminal part and the C-terminal catalytic domain. The core catalytic domain is composed of a cysteine-rich region and a following double-stranded β helix domain (DSBH) characteristic for Fe2+/αKG dioxygenases and (Hu et al. 2013, 2015). In metazoan TETs, the DSBH domain is interrupted by a large unstructured region, which is believed to engage in protein-protein interactions. In their N-termini, TET1 and TET3 contain a CXXC domain, which interacts with DNA (Xu et al. 2012; Jin et al. 2016). The CXXC domain of TET2 was lost during evolution after gene duplication and inversion, and is now encoded as a separate protein IDAX (inhibition of the dvl and axin complex) (Iyer et al. 2009).

The recently solved crystal structure of the human TET2 catalytic domain in complex with 5mC containing DNA substrate (PDB ID: 4NM6) revealed that the DSBH core domain forms a globular structure, which is stabilized by the Cys-rich region enwrapping the DSBH core (Hu et al. 2013). The Cys-rich region is crucial for the stability of the DSBH domain and consequently for catalysis (Hu et al. 2013). DNA is bound above the DSBH core in a groove that is enriched in basic and hydrophobic amino acids. Similar to DNMTs and DNA repair enzymes (Klimasauskas et al. 1994), TET enzymes utilize a base flipping mechanism to position the target base in the catalytic pocket for the oxidation reaction. Once the base is located in the catalytic pocket, the methyl-group is oriented towards the catalytic iron and α-KG, which facilitate the catalytic turnover (Hu et al. 2013). TET enzymes follow a conserved catalytic mechanism that is characteristic for other known Fe2+/αKG-dependent dioxygenases, like histone lysine demethylases (JMJC-family) [reviewed in Hausinger and Schofield (2015)].

3.2 Intrinsic DNA Sequence Specificity of TET Enzymes

Most of the studies on TET enzymes were focused on elucidating their biological role and their reaction products; however, the intrinsic biochemical properties of TET enzymes that govern their function remain not well investigated. Recent reports showed that TET-dependent demethylation in zygotes represents only a small fraction of all demethylation events and that TET-associated demethylation seems to be locus specific (Guo et al. 2014; von Meyenn et al. 2016). Additionally, fine mapping of the genomic location of 5hmC using SCL-exo protocol showed that 5hmC is highly enriched within defined sequence context (Serandour et al. 2016), suggesting that TET enzymes could display some sequence preferences. It is however still unknown what is the molecular reason granting this sequence selectivity. Both DNMTs and TET enzymes modify CpG dinucleotides, yet DNMT3 enzymes can also efficiently methylate non-CpG sites (as discussed above). Unfortunately, little work was contributed to investigate the intrinsic preference of TET enzymes towards non-CpG sites. In the initial report that identified TET enzymes as 5mC hydroxylases, the authors showed that these enzymes are capable of oxidation of 5mC embedded in a CpG site, yet non-CpG substrates were not tested (Tahiliani et al. 2009). Later on, it has been showed that 5mCpA and 5mCpC sites were poor substrates for TET2, with conversion efficiencies of <2% and <5%, respectively, as opposed to >85% for 5mCG sites in the same sequence context (Hu et al. 2013). Like TET2, TET1 preferentially oxidizes 5mCpG with some incidence of oxidation of 5mCpC sites. Structural studies provided molecular explanation for the observed preference of TET enzymes towards the CpG sites (Hu et al. 2013). The TET2-DNA crystal structure showed that the target 5mC is specifically recognized by two hydrogen bonds formed by the side chains of H1904 and N1387 and base endocyclic nitrogen atoms N3 and N4, respectively. The base-stacking interaction between TET2 Y1902 residue and the pyrimidine base of the 5mC additionally supports this recognition. Furthermore, base-stacking interaction between Y1294 residue and the G:5mC base pair in the DNA provides specific recognition of the following G:C base pair within the CpG dinucleotide. Intriguingly, TET2 enzyme does not make any contact with the methyl group of the target cytosine, suggesting that it could generate oxidation of 5hmC to 5fC and 5caC (Hu et al. 2013).

In the TET2:DNA co-crystal structure no protein-base specific contacts outside of the CpG site were observed, suggesting that the enzyme has weak or no flanking sequence specificity (Hu et al. 2013, 2015). Nevertheless, the bound DNA is strongly bent and distorted, giving the possibility of indirect readout of DNA sequence as observed with numerous other DNA binding proteins, restriction enzymes and bacterial MTases (Jurkowski et al. 2007; Little et al. 2008). Whether TET enzymes use indirect readout for sequence recognition remains to be addressed.

In addition, TET enzymes are also able to oxidize the methyl group of thymine (T) to 5-hydroxymethyl uracil (5hmU) (Pfaffeneder et al. 2014), however the efficiency of this reaction is rather low, and its physiological relevance still needs to be uncovered.

3.3 On Site and Lateral Processivity of TET Enzymes

Processivity of TET enzymes can be regarded in two different ways. First, as a serial oxidation of 5mC to 5hmC, 5fC and 5caC on a single CpG site without the enzyme dissociating from that site, which could be regarded as “on-site processivity”. Second, which could be called “lateral processivity”, is the consecutive oxidation of numerous CpGs on a single DNA molecule.

The isolated catalytic domain of human TET2 efficiently oxidizes 5mC to 5hmC, yet, further oxidation steps are inefficient, leading to the reaction stalling at the 5hmC state (Hu et al. 2015). Conversely, numerous reports showed that TET enzymes are capable of efficient conversion of 5mC to 5caC without being blocked at the 5hmC state (Tamanaha et al. 2016; Liu et al. 2017; Crawford et al. 2016). Moreover, the same group which reported stalling of the oxidation reaction at the 5hmC state, also showed that TET2 could convert 5mC all the way to 5caC (Hu et al. 2013). It is likely that the contradictory conclusions of the studies that investigated mouse TET2 “on-site” processivity could be potentially explained by differences in the reaction conditions and experimental setup (Tamanaha et al. 2016; Crawford et al. 2016).

The catalytic domains of TET1 and TET2 show no preference to modify neighboring CpG sites on the same DNA molecule, suggesting that TET enzymes are not laterally processive (Tamanaha et al. 2016). However, both TET1 and TET3 full-length enzymes contain an additional DNA binding domain, namely the CXXC domain, which can modify the enzymes behavior on DNA. Strikingly, TET3 CXXC domain preferentially binds 5caCpG sites, which represent the final TET reaction product. This observation led to proposal that the TET3 CXXC—5caCpG interaction could stimulate processive activity of the enzyme and consequently lead to spreading of the 5caC from the first oxidized CpG site. In the proposed model, the first 5mCpG site that is oxidized to 5caCpG gets bound by the CXXC domain of the enzyme, therefore keeping the catalytic domain in close proximity and promoting oxidation of nearby 5mCpGs (Jin et al. 2016). It is a very interesting hypothesis, which still requires further experimental validation.

3.4 Oxidation of RNA Bases

Mammalian TET enzymes were initially found to oxidize 5mC in genomic DNA, but have since been also shown to oxidize 5mC in RNA (Basanta-Sanchez et al. 2017; Fu et al. 2014). Moreover, the presence of TET homologues in organisms that do not possess any active DNA methyltransferase, like D. melanogaster, suggested that other substrate than 5mC in dsDNA could be processed by the enzyme (Dunwell et al. 2013). Indeed, Drosophila TET is responsible (at least in part) for the formation of 5hmC in the fly mRNAs, particularly in mRNAs involved in neuronal development. Consequently, blocking of the TET enzyme causes brain defects and is lethal. In vivo, RNA hydroxymethylation promotes mRNA translation (Delatte et al. 2016).

DNA and RNA represent different structural configurations, which impact the way TET enzymes can interact with them. DeNizio and colleagues performed a systematic survey aimed to compare the activity of TET2 CD on ds- and ss- DNA and RNA, as well as DNA/RNA hybrids. They discovered that 5mC in dsDNA is the most proficient substrate, ssRNA and ssDNA are well tolerated, whereas dsRNA is a very poor substrate for TET2 (DeNizio et al. 2018).

3.5 Recruitment of TET Enzymes

The mechanisms of locus specific recruitment and regulation of TET enzymes is much less understood than the genomic distribution and physiological relevance of the oxidized-5mC derivatives. The CXXC domains located in the N-termini of TET1 and TET3 are thought to be at least in part responsible for the targeting of the enzymes to the CpG-rich regions (CpG islands), as the CXXC domain has been shown to recruit DNMT1, MLL1, CFP1 to unmethylated CpG sites (Stroynowska-Czerwinska et al. 2018; Xu et al. 2018). Consistently, DNA binding studies showed that the CXXC domain of TET1 is able to bind to CpG-rich DNA irrespective of its modification state (C, 5mC or 5hmC) (Zhang et al. 2010a), whereas the CXXC domain of TET3 from Xenopus binds unmodified cytosines in both CpG and non-CpG context, with a slightly higher preference for CpG (Xu et al. 2012; Jin et al. 2016). Another interesting study demonstrated that the CXXC domain of TET3 can bind 5caCpG and that full-length TET3 preferentially binds to the transcriptional start sites (TSS) of genes involved in base excision repair (Jin et al. 2016). This suggests that TET3 may be specifically targeted to these loci through the CXXC domain or by other interacting proteins (Jin et al. 2016).

TET2, which lacks the CXXC domain may be more depend on other proteins, for example transcription factors (TFs), for locus specific recruitment. Supporting this idea, TET2 interacts with the transcription factor Wilms tumor (WT) and Early B cell factor 1 (EBF1), which modulate TET2 activity and target gene expression [reviewed in (Ravichandran et al.)]. Recently, several TFs important for cellular differentiation were reported to induce DNA demethylation by interacting with TET proteins. For example, RUNX1, an essential master transcription factor in hematopoietic development and an important regulator of immune functions, was shown to recruit TET2 and induce local DNA demethylation at its binding regions (Suzuki et al. 2017). Likewise, NANOG-dependent recruitment of TET1 and TET2 promotes expression of genes involved in reprogramming and lineage commitment (Costa et al. 2013). Furthermore, a study by Perera and colleagues in mouse retinal cells demonstrated that RE1-silencing transcription factor (REST) recruits an isoform of TET3 lacking the CXXC domain along with the histone methyltransferase NSD3 to activate its target genes (Perera et al. 2015). TET enzymes were shown to interact with proteins involved in base excision repair pathway such as TDG, PARP1, MBD4, NEIL (Muller et al. 2014). Furthermore, all three TET enzymes associate with O-linked ß-D-N-acetylglucosamine (O-GlcNAc) transferase (OGT). It has been suggested that TETs recruit OGT to the chromatin and that TET-OGT interaction promotes the OGT activity (Vella et al. 2013; Chen et al. 2013). In summary, it is increasingly clear that TET enzymes do not function alone but interact with multiple other proteins in a contextual manner and through this cooperation modulate gene expression.

4 Synthetic Programming of DNA Methylation

Rapid development of next-generation based sequencing technologies enabled genome-wide interrogation of cytosine methylation at single-base resolution, providing invaluable insights into the frequency and genomic distribution of 5mCs, as well as into the interplay between DNA methylation and other epigenetic mechanisms. Yet, the lack of tools for locus-specific manipulation of cytosine status has hampered the functional understanding of the role of DNA methylation and demethylation. Recent progress in programmable DNA binding domains has open new synthetic ways to study epigenetic regulation (Jurkowski et al. 2015), and in particular DNA methylation and demethylation. Fusing an active DNA methyltransferase or demethylase (or any other epigenetic enzyme) to a customizable DNA binding domain enables targeting of the methylation or demethylation functionality to selected places in the genome. From a mechanistic point of view this powerful technology permits not only to study the principles of how the enzymes set up or remove the methylation mark, but also to directly probe and dissect the epigenetic mechanisms and transcriptional consequences of DNA methylation or demethylation at a given genomic locus. On the application side, it allows verifying consequentiality of disease associated epigenetic changes or even their repair as a potential therapeutic strategy.

4.1 Programmable Genome Targeting Modules

Three different classes of programmable DNA binding domains have been employed so far in epigenetic editing. The C2H2 zinc fingers were the first example of predictable DNA interaction domains amenable to rational protein design [reviewed in Wolfe et al. (2000), Pabo et al. (2001)] and were first used for programmable sequence specific genome targeting of fused epigenetic enzymes (Xu and Bestor 1997). More recently, two additional programmable genome binders were discovered: the TAL effector arrays (TALE) (Boch et al. 2009) and CRISPR/Cas9 systems (Jinek et al. 2012). The transcription activating-like effectors (TALEs) are important virulence factors initially isolated from the bacterial plant pathogen Xanthomonas (Boch and Bonas 2010) and are composed of tandemly arranged 34 amino acid long highly similar repeats (Scholze and Boch 2010).

The newest and most exciting addition to the genome targeting toolbox repository is the CRISPR/Cas9 system (Hsu et al. 2014). CRISPR (clustered regularly interspaced short palindromic repeats) functions as a prokaryotic adaptive immune system that confers resistance to exogenous genetic elements such as plasmids and phages (Mojica et al. 2005). CRISPR/Cas9 proteins recognize their targets based on Watson/Crick base pairing and rely on complementarity of the recognized DNA and the guide RNA sequences which are used for targeting. Therefore, retargeting of the Cas9 protein to specific genomic location requires only a gRNA component specific for the desired target. However, because Cas9 is an active nuclease, for targeting of the epigenetic enzymes, a catalytically inactive Cas9 variant is used. It still recognizes and binds the target sequence, yet does not cleave it (Qi et al. 2013). Whereas each of the available programmable genome targeting domains offers unique advantages and disadvantages, due to the simplicity of target design and the possibility for multiplexing CRISPR/Cas9 system seems the most attractive.

4.2 Epigenetic Effector Domains

The epigenetic editing activity is provided by fusing active DNA methylating or demethylating enzymes to the targeting domain. Until now different DNA methyltransferases have been used [reviewed in Lau and Suh (2018), Lei et al. (2018)], which include bacterial CpG specific methyltransferases M.SssI (Xiong et al. 2017) or MQ1 (a modified CpG methyltransferase derived from Mollicutes spiroplasma) (Lei et al. 2017), the catalytic domains or full-length mammalian Dnmt3a (Vojta et al. 2016; Liu et al. 2016) or Dnmt3b (Lin et al. 2018) proteins, as well as an engineered Dnmt3a-Dnmt3L fusion protein, which in addition to the Dnmt3a methyltransferase contains the co-activator protein Dnmt3L (Stepper et al. 2017; Saunderson et al. 2017). For targeted DNA demethylation, all three mammalian TET enzymes have been used (Liu et al. 2016), yet TET1 CD is the most commonly used version. Interestingly, a direct removal of methylated cytosine has also been achieved recently by employing plant ROS1 DNA glycosylase, leading to the transcriptional increase of the target locus (Parrilla-Doblas et al. 2017).

4.3 Applications of Targeted DNA Methylation/Demethylation

Targeted DNA modification (both methylation and demethylation) has the potential to answer so far unapproachable questions in basic and translational research. It allows mechanistic dissection of the epigenetic signaling cascades and validation of the causality of epigenetic changes in diseases. It can widen our understanding of epigenetic dynamics and the basis of stability of DNA methylation signal, but also address the contribution of epigenetic changes to etiology of complex and simple diseases, through discovery and validation of disease-promoting epimutations and provide means for reverting them (Fig. 3).

Fig. 3
figure 3

Targeted DNA methylation and demethylation controls gene expression. Locus-specific targeting of DNA methyltransferases (DNMTs) represses the target gene, whereas demethylation of a methylated promoter after targeting of TETs leads to gene activation. Green lollipops indicate methylated CpG sites, white lollipops—unmethylated CpG sites

4.3.1 How Do Epigenetic Changes Contribute to Disease Etiology?

As discussed above, widespread changes in DNA methylation patterns are commonly observed in diseases (Egger et al. 2004; Koch et al. 2018), including cancer, chronic or acute diseases. However, it is hard to evaluate whether these epigenetic changes are causal for the disease progression or are merely by-standers, reflecting the overall epigenetic dysregulation caused by the disease.

Epigenome-wide association studies (EWAS) are commonly used to derive associations between epigenetic variation and a particular identifiable phenotype (Birney et al. 2016). When epigenetic patterns, such as DNA methylation, change at specific loci, discriminating the phenotypically affected cases from the control individuals, this is considered an indication that epigenetic perturbation has taken place that is associated either causally or consequentially with the studied phenotype. However, EWAS results do not allow discriminating causal from consequential epigenetic changes, just merely their correlation with the screened phenotype. In such cases, targeted DNA methylation/demethylation could be used to study the causality of the observed changes towards the phenotype.

Aberrant promoter methylation is a well-recognized hallmark of cancer; however, it is unclear whether epigenetic changes are enough to drive cellular transformation. Sanderson (Saunderson et al. 2017) used CRISPR-based targeted DNA methylation to stably methylate and repress the CDKN2A, HIC1, PTEN and RASSF1 tumor suppressor genes in healthy primary breast cells. Furthermore, they show that targeted de novo methylation of the CDKN2A p16 transcript promoter prevented cells from entering senescence arrest, thus possibly facilitating tumor initiation.

4.3.2 Repair of Aberrant, Disease Causing Epigenetic States

Fragile X syndrome (FXS) is the most frequent form of inherited mental retardation (Sutcliffe et al. 1992). FMR-1 gene found in fragile X patients shows an increase in the number of CGG repeats and an abnormal methylation of a CpG island 250 bp proximal to this repeat. Liu et al., used the dCas9-TET1 CD construct to demethylate CGG repeats in FXS induced pluripotent stem cells (iPS) and reactivate the silenced FMR1 gene by demethylating and activating its promoter (Liu et al. 2018).

4.4 Limitations of Targeted DNA Methylation and Demethylation Tools

Despite being such a very powerful technology, epigenetic editing has also its pitfalls and limitations. The promise of epigenetic regulation is that once DNA methylation is established or removed, cellular epigenetic mechanisms will maintain the new state of the locus, such that it can be inherited after semiconservative DNA replication (Jeltsch and Jurkowska 2014). Therefore, targeted DNA methylation or demethylation could provide a unique opportunity to heritably switch off gene expression (loss-of-function) (Siddique et al. 2013; Nunna et al. 2014; Stolzenburg et al. 2012). However, recent reports indicate that DNA methylation deposited at active gene promoters is not necessarily stably maintained and consequently gets diluted with DNA replication and cell division (Vojta et al. 2016; Kungulovski et al. 2015). Nevertheless, stable epigenetic reprogramming has also been achieved (Saunderson et al. 2017; Amabile et al. 2016), yet the epigenetic mechanisms which grant this stability are not well understood. It is plausible that the stability of the introduced DNA methylation will depend on the local environment of the targeted locus. As DNA methylation is just one layer of epigenetic regulation, targeting of multiple epigenetic marks simultaneously might improve the stability of introduced DNA methylation.

Unintended, off-target epigenetic modification can lead to misinterpretation of the epigenetic editing experiments in regard to the observed biological effects, therefore specificity of introduced epigenetic modification is of principal importance. The precision of targeting is even more important for potential therapeutic epigenetic interventions, as mistargeted modifications can disregulate other genes and cause additional diseases. Because of this, numerous studies have addressed this issue [reviewed in Lei et al. (2018)]. Two types of off-targeting can be distinguished: first one stems from the misrecognition of the targeting module (i.e. binding promiscuity of the dCas9 part to other near-cognate sequences in the genome) and the second one is the unintended modification by the epigenetic domains used.

In targeting experiments, dCas9 ChIP-seq coupled with bisulfite sequencing has been used to investigate the off-target methylation, and showed that even at the top ranking dCas9 binding sites dCas9-DNMT3a only marginally increased DNA methylation relative to the methylation observed at the intentionally targeted loci (Liu et al. 2016), suggesting that mis-targeting of dCas9 is not contributing strongly to off-target methylation. Other studies applied genome-wide sequencing technologies, including reduced-representation bisulfite sequencing (RRBS) and whole genome bisulfite sequencing to assess potential side effects of various methylation tools and reported no detectable off-target hypermethylation (Huang et al. 2017; Lei et al. 2017). Similarly, few off-target effects have been reported with demethylation tools.

In contrast, numerous studies also reported significant off-target methylation when targeting dCas9-DNMT3a CD. The extent of the off-targets effects varied vastly (Huang et al. 2017; Lei et al. 2017; McDonald et al. 2016). A recent study showed presence of extensive off-target genome-wide methylation in mouse ES cells (mESC) and somatic cells (Galonska et al. 2018) regardless of whether or not sgRNA was used for targeting. Expression level of the constructs might greatly influence the extent of off-targets, as once the “true” binding sites are occupied, the rest of the produced targeting constructs will be available to modify unspecific sites.