Keywords

1 Introduction

The chromatin of a multicellular organism stores a vast quantity of information that defines the complex gene expression patterns in diverse cell types, and is indispensable for growth and development. This information is stored both genetically in DNA sequences and epigenetically through DNA and histone modifications [1,2,3,4]. However, nearly all cells in an organism (except gametes and some immune cells) contain the same genomic sequences as the zygotic genome and therefore, it is the epigenetic information residing in chromatin that determines a cell’s identity and its corresponding gene expression profiles [5]. Generally, epigenetic information is faithfully propagated to each progeny cell upon division to maintain cell identity, but epigenetic states can also undergo dynamic changes during lineage specification or upon certain environmental stimuli [6]. Aberrant alterations of epigenetic information, such as DNA and histone modifications, are frequently associated with the onset of various human diseases, including cancers [7].

DNA methylation, in particular, is the most common covalent modification of DNA. The best-studied form of DNA methylation is 5-methylcytosine (5mC), which is generated by S-adenosyl-L-methionine (SAM)-dependent DNA methyltransferases (DNMTs), and does not interfere with Watson-Crick base pairing [8, 9]. In mammals, this enzymatically introduced DNA methylation exists predominantly in the CpG dinucleotide context (a cytosine followed by a guanine) and carries epigenetic information typically required for long-term gene silencing. Notably, over 70% of CpGs in somatic mammalian cells are methylated, therefore 5mC has long been the focus of compelling biochemical and genomics studies. In addition to 5mC, other forms of DNA methylation also exist. However, it is important to note that not all DNA methylation is the carrier of epigenetic information. For example, methylation can be introduced by endogenous or exogenous methylation agents-mediated DNA damage [9]. These methylated bases, such as N1-methyladenosine (m1A) and N3-methylcytosine (m3C), are considered cytotoxic or mutagenic as they tend to block or alter Watson-Crick base pairing. In the following sections, we will focus on the demethylation of 5mC, and will use the term DNA methylation to refer to 5mC only.

2 DNA Methylation Machinery

In mammals, three enzymatically active DNMTs, namely DNMT1, DNMT3A and DNMT3B, catalyze the transfer of a methyl group from SAM to the carbon-5 position of cytosine residues in DNA, generating 5mC [8]. DNMT1 preferentially methylates hemimethylated DNA [10], and in the presence of its cofactor UHRF1 (also known as NP95) [11, 12], DNMT1 is mainly responsible for copying DNA methylation patterns to the daughter strands during DNA replication (maintenance methylation). In contrast, DNMT3A and DNMT3B are the main enzymes to establish initial DNA methylation patterns during early embryonic development (de novo methylation), and do not show any preference for hemimethylated DNA [13]. Nevertheless, both maintenance and de novo methylation activities are required for normal development as depletion of DNMT1 or DNMT3B in mice results in embryonic lethality, and Dnmt3a-knockout mice die 4–8 weeks after birth [13, 14].

Structurally, the methyl group of 5mC is located in the major groove of DNA double helix, and is involved in either attracting or repelling many DNA binding proteins [15]. For example, three of the methyl-CpG binding domain (MBD) containing proteins, MeCP2, MBD1, MBD2, and a transcriptional regulator KAISO, have been shown to preferentially bind to methylated DNA and recruit repressor complexes to methylated promoters, leading to subsequent chromatin condensation and gene silencing [16]. On the contrary, DNA methylation can also prevent binding of some transcription factors (TFs), such as YY1 and CTCF [17, 18], to their specific recognition sites. DNA methylation has been demonstrated to play critical roles in various of cellular processes such as genomic imprinting, X-chromosome inactivation, retrotransposon silencing as well as maintenance of cell identity, supporting its general transcription repression function and heritable nature [15].

3 Passive and Active DNA Demethylation

While most histone modifications are readily reversible [19], DNA methylation has been generally viewed as a relatively stable epigenetic mark. Indeed, there is a dedicated maintenance enzyme, DNMT1, to faithfully copy DNA methylation patterns to daughter strands during DNA replication; in addition, the methyl group on 5mC is connected to the base through a C-C bond which exhibits high chemical stability under physiological conditions; furthermore, no DNA demethylase could be identified by 2009 when a large number of histone demethylases had been discovered. Nevertheless, studies in the past decade have indicated that DNA methylation is not as static as once thought. Loss of DNA methylation, or DNA demethylation, has been reported in various biological contexts and can be achieved through either passive or active mechanisms.

As illustrated in Fig. 1, passive DNA demethylation, or replication-dependent dilution of 5mC, refers to loss of 5mC instead of semi conservatively replicating methylation patterns during DNA replication. In the absence of functional maintenance methylation machinery, i.e., DNMT1 and UHRF1, successive cycles of DNA replication can result in gradual dilution of 5mC to achieve global DNA demethylation. Passive DNA demethylation has been demonstrated to play a major role in maternal-genome demethylation of zygotes [20,21,22], and in the whole-genome demethylation of primordial germ cells (PGCs) [23,24,25].

Fig. 1
figure 1

Major mechanisms of passive and active DNA demethylation in the mammalian genome. DNA methylation patterns are established by DNMT3 proteins and maintained by DNMT1 during DNA replication, and passive DNA demethylation occurs when DNMT1 is inhibited. TET proteins can oxidize 5-methylcytosine (5mC) to generate 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC), which are inefficient substrates for DNMT1 and are passively diluted during DNA replication. This form of active DNA demethylation is termed as active modification followed by passive dilution (AM-PD). Among the three 5mC oxidation derivatives, 5fC and 5caC can be excised by TDG to form abasic sites, which are further repaired by base excision repair (BER) pathway to complete DNA demethylation. This form of active DNA demethylation is termed as active modification followed by active restoration (AM-AR)

By contrast, active DNA demethylation refers to direct removal of the methyl group from 5mC, or an enzymatic process that removes or modifies 5mC with regeneration of unmodified cytosine. Processes that are initiated with active modification (AM) of 5mC can be further divided into two forms by whether the modified 5mC is converted to unmodified cytosine through passive dilution (PD) or active restoration (AR). Similar to passive DNA demethylation, the AM-PD pathway may be well suited for large-scale DNA demethylation events observed in PGCs and zygotes, which we will discuss in Sect. 4.4. However, the AM-AR pathway and direct removal of the methyl group or the 5mC base may take place rapidly, and are implicated in locus-specific demethylation which requires rapid response towards environmental stimuli. For example, rapid active DNA demethylation was observed at the interleukin-2 (IL-2) promoter-enhancer region in activated T lymphocytes within 20 minutes upon stimulation [26], at the promoter of brain-derived neurotrophic factor (BDNF) in KCl-stimulated postmitotic neurons without DNA replication [27], at several other specific genomic loci in response to nuclear hormone and growth factors [28,29,30]. Thus, these studies suggest that active DNA demethylation could function in the dynamic regulation of genes that require rapid responses to specific environmental stimuli.

4 TET-Mediated Oxidative DNA Demethylation

4.1 TET Family Dioxygenases

While passive DNA demethylation has long been understood and accepted, the mechanism of active DNA demethylation was not understood until recently, following the discovery that TET (ten-eleven translocation) proteins can convert 5mC to its oxidized forms, namely 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC). Interestingly, the names of TET genes trace back to the involvement of the human TET1 gene in the ten-eleven translocation [t(10;11)(q22;q23)] in rare cases of acute myeloid leukemia (AML), which fuses the TET1 gene on chromosome 10 with the mixed-lineage leukemia gene (MLL; also known as KMT2A) on chromosome 11 [31, 32]. Along with TET1, two additional genes in this protein family, TET2 and TET3 are also identified based on their sequence homology. Sequence comparison and structural studies have shown that TET proteins are a distinct family of the Fe(II)/α-ketoglutarate (αKG)-dependent dioxygenase superfamily [33, 34], members of which also include JmjC domain containing histone demethylases and AlkB family DNA/RNA repair proteins.

Similar to most Fe(II)/αKG-dependent dioxygenase superfamily members, TET proteins share a conserved DSBH fold (or jelly-roll fold) in their catalytic domains, consisting of eight antiparallel β-strands (I–VIII) and an iron-binding motif (Fig. 2). Unique characteristics are also present in the catalytic domain of TET proteins, such as a cysteine-rich domain adjacent to the N terminus of the DSBH fold, and a large non-conserved low-complexity region between conserved β-strands IV and V [33, 35, 36]. Although the low-complexity region’s function is not clear, the cysteine-rich domain has been shown to stabilize substrate DNA by wrapping around the DSBH core, and is essential for the enzymatic activity [37]. Outside of the catalytic domain, a CXXC (Cysteine-X-X-cysteine) domain is present at the N terminus of both TET1 and TET3. Indeed, TET1 is also known as CXXC6 (CXXC zinc finger 6) and LCX (leukemia-associated CXXC protein). Both in vitro DNA binding assay and structural analyses have revealed that TET proteins’ CXXC domain strongly binds to unmethylated DNA [38]. However, the catalytic domains of TET proteins alone can also bind to DNA and oxidize 5mC without the help of a CXXC domain [39]. Therefore, the catalytic domains of TET proteins may possess a non-sequence-specific DNA-binding capacity whereas the CXXC domain may increase the sequence selectivity to facilitate and regulate binding of TET proteins to their genomic targets [38, 40, 41]. Surprisingly, TET2 does not possess a CXXC domain, which was suggested to be lost during evolution, and is now encoded by a separate, neighboring gene IDAX (also known as CXXC4)[41]. It is worth noting that TET3 has three isoforms, among which only the full-length form contains the CXXC domain [42]. The full-length TET3 was reported to bind to 5caC at CCG sequences through its CXXC domain, and promote DNA demethylation by acting as a regulator of 5caC removal by base excision repair [42].

Fig. 2
figure 2

Schematic diagrams of the TET proteins. Three conserved domains are indicated in mouse TET proteins, including CXXC zinc finger, cysteine-rich region (Cys-rich), and the double-stranded β-helix (DSBH) fold of the Fe(II)/α-ketoglutarate (αKG)-dependent dioxygenases. Locations of Fe(II) and the αKG-binding sites in the conserved DSBH fold are shown in the topology diagram. Also presented are the domain structures of three other enzymes in the super family: Trypanosoma brucei JBP1, JBP2, and Escherichia coli AlkB. Note that TET3 has a shorter form, which starts at amino acid 136, that does not contain the CXXC domain

4.2 TET-Mediated Iterative Oxidation of 5mC

The finding that TET proteins can convert 5mC of DNA to 5hmC by oxidation was a great advance in understanding the mechanisms of DNA demethylation [34]. This finding was initially inspired by the biosynthesis of glucosylated 5-hydroxymethyluracil (base J) in the genome of Trypanosoma brucei, a parasite causing African sleeping sickness. Base J synthesis involves oxidation of thymine to 5-hydroxymethyluracil (5hmU) by J-binding proteins 1 and 2 (JBP1 and JBP2), two enzymes of the Fe(II)/αKG-dependent dioxygenase superfamily [43]. Because of the structural similarity between 5mC and thymine, mammalian homologs of JBP proteins were thought to possess 5mC oxidation activity, and TET family proteins were identified as the mammalian homologs of JBP proteins [33, 34]. Interestingly, the presence of TET genes in animals seems to coincide with the presence of 5mC in the genome [33, 36]. It was then convincingly demonstrated by in vitro biochemical experiments that TET proteins can oxidize 5mC to 5hmC [34]. Moreover, 5hmC is relatively abundant in mouse embryonic stem cells (ESCs) where both TET1 and TET2 are highly expressed, and its presence is TET dependent [34, 39], providing in vivo evidence that 5hmC is generated by TET-mediated oxidation of 5mC.

Fe(II)/αKG-dependent dioxygenase-mediated oxidation reactions typically consist of two stages: dioxygen activation and substrate oxidation (Fig. 3). The dioxygen activation stage is a four-electron process, where Fe(II) and αKG may each contribute two electrons to activate a dioxygen molecule first into bridged peroxo and then into the Fe(IV)-oxo intermediate. In the following substrate oxidation stage, the inert C-H bond of the substrate can be oxidized by the highly active Fe(IV)-oxo species, and finally Fe(IV) is reduced back into Fe(II) to complete the catalytic cycle [44]. During the whole process, four electrons, two from αKG and another two from the substrate C-H bond, are consumed to fully reduce a dioxygen molecule. The two oxygen atoms of the dioxygen molecule are incorporated into the succinate (the oxidized and decarboxylated product of αKG) and the oxidized product (Fig. 3).

Fig. 3
figure 3

The chemical mechanism underlying TET–mediated 5mC oxidation. In the dioxygen activation stage, Fe(II) and αKG activate dioxygen to form a highly active Fe(IV)-oxo species; and in the substrate oxidation stage, Fe(IV)-oxo inserts the oxygen atom into the C–H bond of the substrate, and Fe(IV) is reduced back to Fe(II) to complete the catalytic cycle

Interestingly, some Fe(II)/αKG-dependent dioxygenase are capable to iteratively oxidize the substrate methyl group to carboxyl group. For instance, the thymine-7-hydroxylase, a Fe(II)/αKG-dependent dioxygenase in the thymidine salvage pathway, is known to catalyze a three-step oxidation of thymine to generate 5-carboxyluracil (isoorotate), where the methyl group of thymine is sequentially oxidized to hydroxymethyl group, formyl group, and finally to the carboxyl group, which is subsequently removed by an isoorotate decarboxylase (IDase) to generate uracil [45]. Enlightened by this example, it was proposed that TET proteins might oxidize 5mC not only to 5hmC, but also to 5fC and 5caC [46]. This hypothesis was soon experimentally proved both in vitro and in vivo [47, 48], and was further supported by a structural study that TET2’s active cavity could recognize CpG dinucleotide regardless of its methylation/oxidation status [37].

4.3 TDG-Mediated Excision of 5fC/5caC

Because 5mC can be converted to 5hmC, 5fC and 5caC, these modified bases are naturally considered to be involved in DNA demethylation. However, unlike the N-methyl group in m1A, m3C and methylated histones, which is unstable on the C-N bond and go through spontaneous hydrolytic deformylation upon enzymatic oxidation (i.e., direct removal of the oxidized methyl group) [19], the methyl group of 5mC is connected through a highly stable C-C bond to the rest of the base, and therefore the oxidized 5-substituents remain stable under physiological conditions [9]. Interestingly, although the 5-substituents seem not to be directly removed from 5mC oxidation derivatives, emerging evidence suggests that once converted to 5fC and 5caC, the modified cytosine base can be entirely removed from DNA by thymine-DNA glycosylase (TDG) [48, 49]. DNA demethylation is then completed by replacing the resulting abasic site with unmodified cytosine through the base excision repair (BER) pathway, similar to the active DNA demethylation mechanism in plants [50]. This TET-TDG-BER-mediated DNA demethylation process may take place rapidly, and seems a perfect candidate for locus-specific demethylation which requires rapid response towards environmental stimuli.

TDG belongs to the uracil-DNA glycosylase (UDG) superfamily. It has been well established that TDG can excise pyrimidine moiety from G/U and G/T mispairs in dsDNA by a base-flipping mechanism [51]. Interestingly, TDG also excises properly base-paired cytosine bases with 5-position substituents that destabilize the base-sugar bond (N-glycosidic bond), such as 5-fluorocytosine, indicating that the stability of the N-glycosidic bond contributes to TDG’s substrate specificity [52]. More recent studies further demonstrated that TDG can recognize and remove 5fC and 5caC, but not 5mC, 5hmC, and unmodified cytosine, from DNA duplex when paired with guanine [48, 49]. Indeed, computational analyses suggest that 5fC and 5caC form a more labile N-glycosidic bond compared to unmodified cytosine, 5mC, 5hmC, and even 5-fluorocytosine [53]. Consistently, TDG has a slightly higher binding affinity towards G/5fC and G/5caC pairs than to G/U and G/T mismatches [49]. When co-overexpressed in HEK293 cells [48, 54], TDG efficiently depletes TET-generated 5fC and 5caC; and in contrast, TDG knockdown in mouse ESCs results in a 5–10-fold increase of endogenous 5fC and 5caC [55, 56], providing in vivo evidence that TDG is responsible for 5fC/5caC removal.

Intriguingly, among the four enzymes with UDG activity in mammals (i.e., TDG, UNG, MBD4, and SMUG1), only TDG is required during mouse embryonic development [57,58,59,60], where global DNA methylation reprogramming takes place, implicating that the DNA glycosylase activity of TDG is essential for DNA demethylation. Compared with the other UDGs, the active site of TDG is indeed uniquely configured to accommodate 5fC/5caC and facilitate its cleavage, as revealed by the crystal structure of human TDG in complex with 5caC-containing dsDNA [61]. Consistently, both Tdg-null-mutant and Tdg-catalytic-mutant mice exhibit abnormal DNA methylation and die around embryonic day (E)12.5 [57, 58], confirming a crucial role of the TET-TDG-BER axis in DNA demethylation during embryonic development.

4.4 Replication-Dependent Dilution of 5mC Oxidation Derivatives

TET mediated 5mC oxidation not only initiates the TET-TDG-BER demethylation pathway, but also generates DNA demethylation intermediates (i.e., 5hmC, 5fC, and 5caC) that can be passively diluted in a replication-dependent manner. Mechanistically, hemi-modified CpGs carrying 5hmC, 5fC or 5caC (XG:GC, where X = 5hmC/5fC/5caC) have been demonstrated to be significantly less efficient in being methylated by DNMT1 compared with hemimethylated CpGs (i.e., 5mCG:GC) [62,63,64], therefore, TET-mediated 5mC oxidation can block the maintenance methylation machinery, facilitating replication-dependent DNA demethylation. Because this DNA demethylation process starts with active modification of 5mC, it has been suggested to be regarded as active DNA demethylation (Fig. 1)[65]. The replication-dependent active DNA demethylation has been observed in the paternal genome (to a less extent in the maternal genome) of zygotes and in developing PGCs [21, 22, 66, 67]. However, it is worth noting that global DNA demethylation in zygotes could be largely achieved without 5mC oxidation due to the inhibition of DNMT1 at this stage. Therefore, 5mC oxidation probably only facilitates, but is not indispensable for, replication-dependent whole genome DNA demethylation. The extent to which 5mC oxidation is required for demethylation may depend on the genomic context of the DNA sequence.

4.5 Other Potential TET-Initiated Active DNA Demethylation Pathways

In addition to the two major TET-mediated DNA demethylation pathways discussed above (i.e., the TET-TDG-BER axis and the replication-dependent dilution of 5mC oxidation derivatives) which have been extensively supported by recent biochemical and genetic studies [65], evidence for the existence of other TET-initiated DNA demethylation pathways, in which 5mC oxidation derivatives act as demethylation intermediates, have also been reported.

Firstly, the 5-carboxyl group on 5caC might be removed by a putative decarboxylase to complete DNA demethylation. This mechanism was proposed under the inspiration of the thymidine salvage pathway that we discussed above [46], where the thymine-7-hydroxylase oxidizes the methyl group of thymine to a carboxyl group that is subsequently removed by an isoorotate decarboxylase to convert thymine to uracil [45]. Although the idea of decarboxylation is more energy-efficient compared with the TET-TDG-BER pathway, only one study reported weak 5caC decarboxylase activity in mouse ESC extracts [68]. In addition, the pronounced increase of endogenous 5caC upon TDG depletion has already indicated that TDG is the major enzyme for 5caC removal [55, 56]. Therefore, whether a 5caC decarboxylase exists remains to be explored.

Secondly, the 5-position substituents may be directly removed by DNMTs. It has been reported that both bacteria and mammalian DNMTs could remove the 5-hydroxymethyl group of 5hmC and the 5-carboxyl group of 5caC in vitro to generate unmodified cytosine in the absence of SAM, the methyl donor in a DNMT-mediated DNA methylation reaction [69,70,71]. However, given that SAM is present in all cell types as a general methyl donor of many other essential biochemical reactions, whether DNMT-mediated 5-position substituent removal of 5hmC/5caC can take place in vivo remains questionable.

Thirdly, 5hmC deamination followed by BER has also been implicated in active DNA demethylation. AID (activation-induced deaminase)/APOBEC (apolipoprotein B mRNA editing enzyme, catalytic polypeptide) family of cytidine deaminases typically target unmodified cytosine in single-strand DNA or RNA to generate mutations, which are required for the generation of antibody diversity in B cells, RNA editing, and retroviral defense [72]. Interestingly, one study showed that AID/APOBEC deaminases might deaminate 5hmC to produce 5hmU in HEK293 cells and in the mouse brain [73], indicating a TET-AID/APOBEC-BER axis for active DNA demethylation. But this potential pathway has been questioned due to the following reasons: 1) AID only acts robustly on single-strand DNA but not on double-strand DNA [74]; 2) AID/APOBEC deaminases exhibit no detectable in vitro deamination activity on 5hmC [54, 75]. Therefore, further evidence is required to support this potential active DNA demethylation mechanism.

5 Potential TET-Independent Active DNA Demethylation Mechanisms

While TET proteins have been widely accepted as major players for active DNA demethylation, many other proteins were also historically proposed to play direct roles in demethylating DNA [46, 76]. Here we list some of the putative mechanisms that are independent of TET proteins, however, due to lack of direct evidence or conflicting observations, these demethylation mechanisms must be reexamined to confirm their biological relevance.

Firstly, despite the difficult nature of breaking a C-C bond, enzymatic removal of the 5-methyl group from 5mC is the simplest way to achieve DNA demethylation. The first protein reported to possess this activity is the methylated DNA binding protein MBD2. It was shown that MBD2-mediated 5-methyl group excision could take place in vitro without any cofactors [77]. However, this observation could not be reproduced by other laboratories and MBD-null mice were viable with normal DNA methylation patterns [78], raising the certain whether MBD2 could serve as a functional DNA demethylase in vivo. In addition to MBD2, elongator complex protein3 (ELP3) was also proposed to achieve DNA demethylation by breaking the C-C bond through a radical SAM mechanism [46]. While ELP3 bears a Fe-S radical SAM domain, and was reported to play a role in the paternal genome demethylation in mouse zygotes [79], direct biochemical evidence demonstrating its enzymatic activity is still lacking. Interestingly, an in vitro study has showed that, in the absence of SAM, the mammalian DNMTs (i.e., DNMT3A, DNMT3B, and DNMT1) themselves could also remove the 5-methyl group from 5mC [80], but the physiological relevance of this observation remains unclear due to the widespread presence of SAM in all cell types as discussed above.

Secondly, the entire 5mC base can be erased by a DNA glycosylase to form an abasic site, followed by BER DNA repair pathway to complete active DNA demethylation. In plants, compelling biochemical and genetic evidence has validated this mechanism with the discovery of a family of specialized DNA glycosylases responsible for 5mC excision, namely Demeter (Dme) family proteins [81]. While no obvious mammalian orthologues of Dme family proteins have been identified, two mammalian DNA glycosylases, TDG and MBD4, were reported to have incision activity against 5mC [82, 83]. However, 5mC incision activity of the two enzymes is about 30 times lower compared with that against G/T mismatches. In addition, Mbd4-null mice were viable and exhibit normal DNA methylation patterns [59]. Although Tdg-deficient mice exhibit abnormal DNA methylation and die around E12.5 [57, 58], the phenotype is more likely to be attributed to the loss of 5fC/5caC incision activity of TDG required in the TET/TDG-mediated DNA demethylation as discussed in Sect. 4.3. Thus, whether BER of 5mC by a DNA glycosylase can contribute to DNA demethylation in mammals has yet to be determined.

Thirdly, active DNA demethylation may also be achieved through deamination of 5mC to generate thymine, followed by BER to replace this mismatched thymine to unmodified cytosine. As discussed earlier, AID/APOBEC family proteins show no detectable in vitro deamination activity on 5hmC, however, these deaminases do deaminate 5mC in the context of single-strand DNA in vitro, despite at a 10-fold slower rate compared with that towards their canonical substrate cytosine [54, 75, 84]. Indeed, several lines of evidence has suggested that AID/APOBEC family deaminases play a role in active DNA demethylation, including studies in zebrafish embryos [85], in mouse PGCs [86], in promoting pluripotency in somatic nuclei after fusion with ESCs [87], and in reprogramming of somatic cells to induced pluripotent stem cells (iPSCs) [88]. Nevertheless, due to conflicting observations and the fact that these deaminases only act robustly on single-strand DNA, further mechanistic studies are required to clarify the function of AID/APOBEC proteins in active DNA demethylation. Interestingly, DNMTs, in addition to AID/APOBEC proteins, have also been shown to deaminate 5mC in the absence of SAM in vitro [29]. But again, the physiological relevance of DNMTs’ in vitro deamination activity is uncertain as depletion of SAM is unlikely in living cells.

Fourthly, nucleotide excision repair (NER), which typically repairs bulky DNA lesions generated by exposure to chemicals and radiation, has also been implicated in active DNA demethylation. Multiple lines of evidence have shown that GADD45 (growth arrest and DNA-damage-inducible 45) family proteins could stimulate active DNA demethylation via NER in frog, zebrafish, and mammals [85, 89,90,91]. However, evidence to the contrary also exists [92, 93]. More importantly, the exact underlying mechanism is still unclear. Therefore, the role of GADD45 family proteins and NER in DNA demethylation remains to be elucidated.

6 Regulation of TET-Mediated DNA Demethylation

Precise control of DNA methylation is critical to the maintenance of genome stability as well as cell-type- and developmental-stage-specific gene expression. Therefore, proper regulation of both DNA methylation and demethylation is required in many biological processes, such as development and the onset of diseases. Compared with the regulation of DNA methylation, which has been extensively studied [8, 15], the regulation of DNA demethylation has just begun to be understood. Because TET-mediated 5mC oxidation has become the most accepted mechanism of active DNA demethylation [9, 94], we only discuss factors involved in the regulation of TET-mediated DNA demethylation in the following sections.

6.1 Regulation of TET Expression

Regulation of enzyme abundance is a common way to control its activity in cells. In mouse E10.5 PGCs, transient conversion of 5mC to 5hmC can be readily detected together with a dramatic TET1 upregulation [67, 95, 96], indicating that regulation of TET expression is an important way to regulate DNA demethylation. Indeed, the three TET genes exhibit different expression patterns in a cell-type- and developmental-stage-specific manner: TET1 shows a high-level of expression specifically in mouse E10.5–12.5 PGCs, the inner cell mass (ICM) of blastocysts, as well as ESCs [39, 95]; TET2 is highly expressed in ESCs, and is broadly expressed in various mouse adult tissues [39]; TET3 is the only TET family member that is highly expressed in mouse oocytes and zygotes [97, 98], although it also shows a broad expression pattern in mouse adult tissues.

The regulation of TET expression has been reported at different levels. At the transcriptional level, cell-type-specific transcription factors (TFs) may play a major role. For example, a large cluster of binding sites for core pluripotency TFs are present in the upstream promoter of mouse Tet1 gene [99], in line with the rapid reduction of TET1 expression upon ESC differentiation [47, 100]. At the posttranscriptional level, it has been shown that an oncogenic microRNA miR-22 could negatively regulate TET family proteins in breast cancer development and in hematopoietic stem cell transformation [101, 102]. In addition, one study reported that the CXXC domain-containing protein IDAX could directly interact with the catalytic domain of TET2 to downregulate TET2 protein through caspase-mediated degradation [41]. Moreover, all three TET proteins are direct substrates of calpains, a family of calcium-dependent proteases. Specifically, calpain1 mediates TET1 and TET2 turnover in mouse ESCs, and calpain2 regulates TET3 level during differentiation [103]. Such multiple layers of regulation on TET expression provides a robust control of TET activity in cells.

6.2 Regulation of TET Activity by Metabolites and Nutrients

Given the importance of precise regulation of DNA methylation in various biological processes, it is not surprising that TET activity can be regulated in multiple ways, including the metabolic states and the milieu of cells (e.g., nutritional and developmental signals, stress, and chemical exposure). For example, both adenosine-5′-triphosphate (ATP) and hydroquinone were reported to stimulate TET-mediated 5mC oxidation [48, 104]. More importantly, the five-carbon dicarboxylic acid αKG, which is part of the tricarboxylic acid (TCA) cycle, is an essential co-factor for TET-mediated 5mC oxidation (as discussed in Sect. 4.2,). Therefore, metabolic states of cells, which affect intracellular αKG levels, may influence TET activity. Indeed, it has been reported that global 5hmC levels were rapidly increased together with αKG levels in mouse livers within 30 min after glucose, glutamine or glutamate injection [105].

In contrast, another five-carbon dicarboxylic acid, 2-hydroxyglutarate (2HG), which is chemically analogous to αKG, has been shown to inhibit TET activity by competing with αKG [106, 107]. Cellular accumulation of 2HG is often caused by tumor-associated mutations in the NADP+-dependent isocitrate dehydrogenase genes (IDH1/IDH2), which encode enzymes that normally produce αKG in the cell. These tumor-associated IDH1/IDH2 mutations (R132 of IDH1 and R140/R172 of IDH2) impair αKG production, and obtain an enzymatic activity to convert αKG to 2HG [108, 109], which inhibits TET activity. Consistently, co-expression of mutant IDH enzymes and TET proteins inhibits TET-mediated 5mC to 5hmC conversion [106, 107]. It was hypothesized that the substitution of the keto group on αKG to a hydroxyl group on 2HG might interfere with Fe(II) binding and stabilize the reaction intermediate. In line with this hypothesis, another two metabolites, fumarate and succinate, which also share structural similarity with αKG, both function as competitive inhibitors of Fe(II)/αKG-dependent dioxygenases, including TET proteins. Similar to 2HG, these metabolites are accumulated in a subset of human cancers with inactivation mutations of fumarate hydratase (FH) and succinate dehydrogenase (SDH), respectively [110]. Thus, multiple intracellular metabolites may regulate TET-mediated oxidative DNA demethylation, at least under certain pathological conditions.

Ascorbate (also known as vitamin C), an essential nutrient for humans and certain other animal species, has also been demonstrated to positively regulate TET activity [111,112,113]. In wild-type, but not Tet1/Tet2 deficient mouse ESCs, ascorbate significantly increases the levels of all 5mC oxidation products, particularly 5fC and 5caC by more than an order of magnitude, leading to a global loss of 5mC (~40%) [111113]. Ascorbate uniquely interacts with the catalytic domain of TET enzymes, enhancing their catalytic activities likely by promoting their folding and/or recycling of Fe(II) [111]. Intriguingly, ascorbate-induced demethylation has stronger effect on the DNA sequences that gain methylation in cultured ES cells compared to blastocysts, which are typically methylated only after implantation in vivo [113]. These studies suggest that ascorbate is a positive modulator of TET activity and may play a critical role in regulating DNA methylation during development. Further studies are needed to elucidate the sequence specificity of ascorbate-mediated stimulation of TET activity.

6.3 Regulation by TET-Interacting Proteins and DNA-Binding Proteins

In addition to the overall TET activity in cells, the specific targeting of TET proteins, and the regulation of their processivity (i.e., why TET-mediated 5mC oxidation tends to stall at 5hmC, and only proceed to 5fC and 5caC at specific loci) provide another important layer of demethylation control. Emerging evidence indicates that the genomic targeting, activity, and processivity of TET enzymes can be modulated by their interacting proteins and some DNA-binding proteins. The O-linked N-acetylglucosamine (O-GlcNAc) transferase OGT, has been reported to directly interact with, and also GlcNAcylate TET proteins [114,115,116,117,118]. Although OGT binding and GlcNAcylation appear not to regulate the enzymatic activity of TET proteins [115], OGT regulates the subcellular location of TET3 by promoting its nuclear export in high-glucose conditions [118]. Moreover, depletion of OGT in mouse ESCs decreases the association of TET1 with chromatin and alters 5hmC enrichment at certain loci [116, 117], suggesting that OGT plays a specific role in targeting and stabilizing TET proteins to the chromatin. In addition to OGT, the CXXC domain protein IDAX was shown to interact with TET2 and was suggested to recruit TET2 to promoters and CpG islands [41]. Recently, a sequence specific transcription factor WT1 (Wilms tumor protein 1) has also been shown to physically interacts with and recruits TET2 to its target genes to activate their expression [119]. Furthermore, PGC7 (also known as STELLA or DPPA3), a maternal factor essential for early development, was demonstrated in one-cell zygotes to protect maternal genome and the imprinting control regions (ICRs) in paternal genome by inhibiting TET activity through direct interaction [97, 120, 121]. These findings suggest an important role of TET interacting partners in targeting and restricting TET activity in the cell.

In addition to TET interacting proteins, some DNA binding proteins can also regulate DNA demethylation. For example, one study showed that knockdown of methyl-CpG binding domain protein 3 (MBD3), which also binds to 5hmC, caused a strong reduction in global 5hmC level in mouse ESCs [122]. In another study, UHRF2 was identified as a 5hmC-specific binding protein in neuronal progenitor cells, and was shown to be capable of stimulating the processivity of TET1 when co-overexpressed with the catalytic domain of TET1 in HEK293T cells [123]. These observations demonstrate that proteins bound to the substrate DNA of TET enzymes may regulate the enzymes’ activity and processivity, therefore implying a role of DNA binding proteins in controlling DNA demethylation.

7 Concluding Remarks

Ever since the discovery of 5mC oxidation by TET family proteins, there has been tremendous progress in understanding the molecular basis of DNA demethylation. Accumulating biochemical and genetic studies have demonstrated that TET family proteins play a critical role in active DNA demethylation during dynamic regulation of DNA methylation patterns in development and diseases. While the TET-TDG-BER pathway and replication-dependent dilution of 5hmC/5fC/5caC have been generally accepted as the major forces of active DNA demethylation, other potential active DNA demethylation mechanisms have also been reported. It is worth noting that most of those observations were made before knowing the existence of 5hmC/5fC/5caC, and by immunostaining of 5mC or bisulfite sequencing that do not distinguish 5mC from 5hmC or unmodified cytosine from 5fC and 5caC [9]. Therefore, with many new technologies recently developed to map various new modifications in the genome, historically reported active DNA demethylation pathways should be revisited to further advance this exciting field by revealing a more comprehensive understanding on how DNA methylation is dynamically regulated.