Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

One of only two chemical differences between RNA and DNA is the presence of a methyl group in deoxythymidine (T, also abbreviated dT, dm5U). This substituent at the carbon 5 is thus part of one out of four integral building blocks of DNA, while no methyl group is present in its counterpart ribouridine (U, also abbreviated rU). The fact that both are metabolically derived from uridine monophosphate (containing ribose) is one of several arguments often used to support the claim that DNA has evolved from RNA (Kun et al. 2015; Muller 2006). It is also a clear indication of the importance of this methyl group, which, from this perspective, constitutes a nucleoside modification. Remarkably, and in contrast to most other nucleoside modifications, the obligatory presence of that methyl group in DNA is cemented by its introduction prior to nucleotide polymerization. Interestingly, thymidine also occurs in RNA (ribothymidine, also rT, m5U, rm5U), most prominently as the namesake of the T(Tψ)-loop in transfer RNA, where it is introduced posttranscriptionally. Methylated versions of the sister nucleobase cytidine are also found in both DNA (5mC) and RNA (m5C); however, all are introduced at the polynucleotide level.

Beyond simple methyl groups, numerous chemically more complex modifications of pyrimidines are known that contain a carbon modification at the 5-position. This group may easily represent the largest group of all known modifications of nucleobases, because it includes a large part of the modifications of the U34-position in the anticodon of transfer RNAs (tRNAs) with its several dozen species (Machnicka et al. 2013). Again, the vast majority of these modifications are introduced at the polynucleotide level, the only exceptions being 5-hydroxymethylpyrimidines (5hmC, 5hmU) and their glycosylated derivatives found in phage DNA, which are generated at the mononucleotide level (Gommers-Ampt and Borst 1995). Interestingly, the resulting triphosphate nucleotides are then incorporated by phage DNA polymerase despite being sterically encumbered to a large degree. Even more interestingly, this tolerance for modifications at the 5-position appears to be a general feature of nucleotide polymerases on both the DNA and RNA level. Most prominently, T7-RNA polymerase, arguably the most commonly used enzyme for RNA synthesis, efficiently incorporates triphosphate ribonucleotides of rT and rm5C, and even more sterically, demanding carbon-5 modifications have been incorporated into RNA this way (Vaught et al. 2004). Similarly, synthetic modifications are available for incorporation into DNA in PCR reactions by non-phage polymerases (Vaught et al. 2010).

The enzymatic mechanisms of pyrimidine C-5 modification involved here are of particular interest, since the central step involves a C–C bond formation. This reaction type is of increased interest to organic chemists and in natural product metabolism. In a large number of cases, proper understanding of the mechanism of bond-forming reactions requires the identification of the nucleophilic partner on one hand and the electrophilic partner on the other hand. This is typically easy for the formation of C–N or C–O bonds but more sophisticated for C–C bonds. In the case at hand, the carbon-5 position in pyrimidines is catalytically activated to form an intermittent carbon nucleophile, while the carbon side chains result from an electrophilic metabolite, such as S-adenosyl-L-methionine (AdoMet) or N 5,N 10-methylenetetrahydrofolate (CH2-THF). Hence, before addressing the biocatalysis of pyrimidine alkylation, we will discuss the reactivity of the 5-position from the perspective of the organic chemist and then do the same with the various electrophilic carbon scaffolds supplied as cofactors by the modification enzymes. Only then will we discuss a series of enzymatic reactions, of 5-pyrimidine alkylation and related relevant processes. Of note, the mechanism of the methyl group oxidations by TET enzymes leading, e.g., to 5hmC and 5hmU (Fu et al. 2014; Pfaffeneder et al. 2014) will not be discussed here. Instead, we will turn to equally fascinating modifications typically found in transfer RNA (tRNA) featuring a uridine at 34-position of the anticodon loop. Although this group of modifications includes a bewildering variety of sophisticated chemical structures, the initial modification reaction bears similarities with the aforementioned relatively simple modifications.

1.1 Chemical Structure and Occurrence of Pyrimidine C-5 Modifications

A surprising variety of pyrimidine modifications at the 5-position are known as of today and have been known for some time. Permutation of functional groups at three positions on the pyrimidine nucleoside, namely, H vs. OH at the 2′-position, NH2 vs. OH at the 4-position, and H vs. CH3 at the 5-position, results in eight species of pyrimidine nucleosides. Ribothymidine (rT, Fig. 1a) is ubiquitous in tRNA and very frequent in ribosomal RNA (rRNA), but not known elsewhere (Motorin and Helm 2011). 5mC is present in bacterial DNA as a guard against restriction nucleases (Roberts et al. 2005), while its presence in promoter and coding regions of eukaryotic genes participates in the regulation of transcription (Bogdanovic and Gomez-Skarmeta 2014). The distribution of m5C in RNA, which is yet more complicated, was recently reviewed by us (Motorin et al. 2010) and has since encountered renewed interest through transcriptome-wide studies (Burgess et al. 2015; Militello et al. 2014; Squires et al. 2012).

Fig. 1
figure 1

DNA and RNA modifications arising from alkylation of the pyrimidine 5-position. (a) Thymidine, ribothymidine, m5C, and 5mC. (b) Modified nucleosides resulting from 5-mPy oxidation. (c) Modified uridines involved in codon recognition at 34-position of the tRNA anticodon

The number of chemical species dramatically increases if 5-modifications other than methyl groups are admitted into this perusal. For example, recent research has discovered, or rediscovered, oxidation products of the 5-methyl group, including 5-hydroxymethyl and 5-formyl derivatives (Fig. 1b) (Kriaucionis and Heintz 2009; Tahiliani et al. 2009; Pfaffeneder et al. 2011). While it is common knowledge that DNA obligatorily contains T as a 5-methylated pyrimidine nucleobase, a less well-known exception is that the abovementioned glycosylated derivatives of 5hmC and 5hmU (termed “J-base”) are not just spurious modifications in the DNA of certain phages, but exist as near quantitative surrogates of the conventional C and T nucleosides (Fig. 1b) (Gommers-Ampt and Borst 1995). The unglycosylated precursor 5hmC was discovered in phage DNA as early as 1953 (Wyatt and Cohen 1953).

In eukaryotes, the existence of 5-hydroxymethylcytidine in DNA is a more recent discovery (Kriaucionis and Heintz 2009; Tahiliani et al. 2009) with a strong impact in fields such as epigenetics and developmental biology, while the corresponding modification in RNA has also been reported decades ago, although incompletely characterized (Racz et al. 1978). Similarly, continued investigations have revealed 5-formylcytosine in DNA (fC, 5fC) (Pfaffeneder et al. 2011), while the corresponding ribonucleotide (f5C) had been described in tRNA as early as 1994 (Moriya et al. 1994). However, further oxidation of 5fC leads to 5-carboxydeoxycytidine (caC, 5caC) (He et al. 2011), of which the ribonucleoside has yet to be discovered. 5hmU as a constituent of mammalian DNA has been discovered in traces and demonstrated to be a consequence of thymidine oxidation by TET enzymes (Pfaffeneder et al. 2014). Finally, the largest structural variety is found in aminomethyluridines, which are ribothymidine derivatives at the oxidation step of 5-hydroxymethyluridine and which predominate at 34-position in the anticodon of tRNAs (Machnicka et al. 2013). In contrast to 5hmU, these are not biochemically formed by oxidation of thymidine, but their biogenesis involves the use of a single carbon building block at the oxidation state of formaldehyde, namely, CH2-THF, as will be detailed below.

1.2 Reactivity of Pyrimidines

A closer look at the catalytic strategies employed by modification enzymes acting on the 5-position of pyrimidines reveals that these exploit the intrinsic chemical reactivity of the pyrimidine ring. While this is not a surprising finding in general, the situation of pyrimidines is counterintuitive to the untrained biochemist, and a brief look at pyrimidine reactivity is conductive to a more intuitive mechanistic understanding of the involved enzyme.

Both nitrogen atoms within pyrimidines exert an electron withdrawing effect, resulting in an electron poor aromatic ring that is susceptible to nucleophiles. A nucleophilic attack, e.g., by bisulphite, at 6-position can be viewed as a Michael addition, while 4-position corresponds directly to the electrophilic center of a carbonyl functionality. Certain reactions with nucleophiles, such as hydrazine treatment, or the deamination reaction used in the so-called bisulphite sequencing (Schaefer et al. 2009; Frommer et al. 1992), sequentially exploit the electrophilic nature of both positions (Fig. 2).

Fig. 2
figure 2

Chemical reactivity of the pyrimidine C-5 position. Nucleophilic reagents and attack vectors are depicted in blue, electrophilic reagents and vectors are in red

In contrast, their electron poverty leaves pyrimidines relatively inert toward electrophilic reagents such as alkylating reagents, with the 3- and 5-positions being the exceptions (Motorin et al. 2010). The N3-position reacts with electrophilic reagents such as kethoxal or CMC, which is exploited in structural probing experiments (Giege et al. 1999). Concerning carbon 5, uridine reacts with the electrophilic formaldehyde under relatively mild acidic conditions to form 5-hydroxymethyluridine. Uracil (Kong et al. 2009), deoxyuridine (Conte et al. 1992), as well as cytidines (Khursid et al. 1982) are reported to also yield 5-hydroxymethylpyrimdines under alkaline conditions. The mechanism under acidic conditions can be understood in analogy to a Friedel–Crafts alkylation/acylation, which involves a stabilization by the lone electron pair of nitrogen 1 of the positive charge introduced by the alkylating agent.

Under alkaline conditions, an intermittent Michael addition of hydroxide at the 6-position would plausibly generate an enolate-type carbon nucleophile, which then reacts with the electrophilic formaldehyde, followed by elimination of the hydroxide to restore the aromatic ring. Note that indeed, the enzymatic mechanisms discussed below for alkylation, acylation, or hydroxymethylation all involve such a Michael attack by a nucleophile, typically a cysteine thiolate (Jurkowski et al. 2008; Motorin et al. 2010). Interestingly, mechanisms discussed for the enzymatic decarboxylation of 5cC and 5fC employ the same path in reverse. Carell et al. described a nonenzymatic in vitro decarboxylation proceeding in the presence of high concentrations of thiol but at low pH (Schiesser et al. 2013; Schiesser et al. 2012). Under the same conditions, removal of formaldehyde from 5hmC was inefficient. Reaction with formaldehyde may be conducted in the presence of amine, resulting in aminomethylation, thus leading to modified pyrimidines that closely resemble native counterparts typically found at 34-position of tRNA. Here again, the catalytic mechanism in the biosynthesis of these modified bases bears similarities (Helm and Alfonzo 2014; El Yacoubi et al. 2012) with that of other modifications using CH2-THF (vide infra).

2 Enzymatic Mechanisms of Pyrimidine Alkylation

Attachment of the alkyl (most frequently -CH3) group to carbon 5 of pyrimidines U and C can be catalyzed by a variety of enzymes which differ in their origin, sequence, and structure yet employ some common principles of catalysis. At the nucleotide level, this reaction is catalyzed by the extensively studied thymidylate synthase (TS), which is a key target enzyme in certain anticancer and immunosuppressive treatments. TS catalyzes the conversion of dUMP into dTMP, an essential reaction for the synthesis of DNA nucleotides. At the polynucleotide level, the methylation of C5 in U and C is insured either by specific DNA-MTases (for 5mC formation) of by RNA-specific m5U-methyltransferases as well as m5C-methyltransferases.

2.1 The Thymidylate Synthase Family

Thymidylate synthase (TS, EC 2.1.1.45) catalyzes the synthesis of dTMP via a reductive methylation of dUMP. This important enzyme family has been extensively studied for almost 40 years, starting in the late 1970s (Santi 1986; Carreras and Santi 1995). The first characterized enzymes used CH2-THF as co-substrate, yielding dihydrofolate, from which CH2-THF was regenerated from serine and FADH2. More recent studies (Koehn and Kohen 2010; Graziani et al. 2006; Agrawal et al. 2004) revealed the existence of a second unusual class of TS, which also act on dUMP and use CH2-THF, but require FADH2 as a direct reaction cofactor. This family is now called FDTS for flavin-dependent TS. The catalytic mechanism is now established for both enzyme families (Hong et al. 2007; Koehn et al. 2009; Mishanina et al. 2012, 2014).

In the first “classical” family of TS enzymes, the initial step of catalysis relies on a highly conserved Cys residue, which is positioned in the active site of the enzyme. This residue is responsible for the activation of the C5 via addition to the C5 = C6 double bond in the pyrimidine ring, resulting in an enolate intermediate. The enolate’s nucleophilic C5 attacks the methylene CH2 group of the folate co-substrate, forming a covalent ternary complex between the enzyme, dUMP, and the folate. The next step of this reaction is a hydride transfer, which allows the formation of the methylated pyrimidine ring, and is followed by the release of the enzyme via a concerted reaction mechanism corresponding to an elimination that reconstitutes the 5–6 carbon double bond (Islam et al. 2014) (Fig. 3). In the Flavin-dependent TS family, the initial step of the reaction may depend on the enzyme nucleophile (generally an OH-group) or on a direct attachment of FADH2 at the 6-position of the pyrimidine base. In the case of an enzyme nucleophile, the major reaction steps are rather similar to the “classical” TS, except the last step of hydride transfer, where FADH2 serves as a hydride donor rather than THF. In the case of the direct activation of dUTP by FADH2, after hydride transfer, the FADH2 is replaced by CH2-THF, and the reaction proceeds by the “classical” way, but without covalent intermediate at the TS active site. The final hydride transfer thus proceeds by an intermolecular rather than an intramolecular reaction (not shown), and THF is the cofactor product as opposed to dihydrofolate in the case of the classical enzymes.

Fig. 3
figure 3

Enzymatic mechanisms for 2′-desoxythymidine formation in DNA building blocks

2.2 Enzymes Performing 5-Pyrimidine Methylation in Nucleic Acids

The rT (m5U) was among the first modified nucleotides discovered in tRNAs, and the respective bacterial enzyme (TrmA or RUMT) catalyzing its biosynthesis was characterized in E. coli in the early 1980s (Greenberg and Dudock 1980; Ny and Bjork 1980; Lindstrom et al. 1985). Studies of its enzymatic mechanism identified AdoMet as its CH3-group donor and Cys324 as a catalytic nucleophile and suggested a simple displacement mechanism of the methylation reaction (Kealey and Santi 1991; Kealey et al. 1991). RUMT catalyzes the modification of U54 in tRNAs, and in addition it is capable of modifying synthetic 16S rRNA in vitro (Gu et al. 1994). The yeast homologue of RUMT was also characterized and its tRNA recognition properties studied using synthetic tRNA transcripts (Nordlund et al. 2000; Becker et al. 1997). rT was also found in bacterial rRNA and a different MTase (ygcA, renamed to RumA/RlmD) was found to be responsible for its formation. Mutagenesis of RUMT and structural studies of RumA identified the residues involved in catalysis (Santi and Hardy 1987; Kealey et al. 1994). Bacteria also have an additional enzyme of the same family (RlmC/RumB), catalyzing m5U747 formation in 23S rRNA. Activity of m5U-MTases was also detected in Archaea (Constantinesco et al. 1999); however their presence is restricted to the Thermococcales and Nanoarchaeota groups. In Pyrococcus abyssi, two close homologues of RlmD fulfill the cellular functions of TrmA (m5U54 in tRNA) and RlmC (equivalent of m5U747 in 23S rRNA) (Auxilien et al. 2011). The analysis of m5U54 formation in B. subtilis revealed an unexpected m5U54-MTase in these gram-positive bacteria. The flavoprotein TrmFO enzyme from B. subtilis uses CH2-THF as a carbon donor, akin to ThyA and ThyX thymidylate synthases (Urbonavicius et al. 2005; Hamdane et al. 2012, 2013). In addition, TrmFO uses the same flavin FADH2 cofactor as the TSFD family, as the reducing agent in the CH3-group transfer. A similar enzyme was found to catalyze the formation of m5U1939 in M. capricolum 23S rRNA (Lartigue et al. 2014).

The distribution of m5C in cellular RNAs from different life domains is complex. In bacteria, this modified residue is present in rRNA, but not in other RNA species; in eukaryotes it is found in tRNA, rRNA, and mRNA (Squires et al. 2012; Hussain et al. 2013), while in Archaea its presence seems to be restricted to tRNAs and some sites in mRNAs (Edelheit et al. 2013). Three m5C residues in E. coli rRNA are formed by three specific enzymes, while in yeast three homologues modify both tRNAs and rRNAs. In higher eukaryotes, at least seven or eight specific proteins are required for the modification of tRNA and cytoplasmic and mitochondrial rRNAs and mRNAs (see Motorin et al. (2010) for further information).

The known enzymes transferring methyl groups from AdoMet to nucleic acids belong to the SPOUT and MTase superfamilies, the latter containing a Rossmann fold for the accommodation of the cofactor. Structure–function relationships in the m5C-MTase family were combined with bioinformatic analyses (Bujnicki et al. 2004), resulting in a subdivision of the known m5C-MTases into four major subfamilies: two groups related to Nop2/Nol1 and YebU/Trm4, a large group related to RsmB or Ynl022c, and a small group represented by P. horikoshi PH1991 and human NSUN6. Further inspection of homologues in higher eukaryotes (Pavlopoulou and Kossida 2009) suggested the existence of a new subgroup of m5C-MTase-related proteins, termed RCMT9, with members distantly related to Trm4 and a distribution restricted to four taxons. A detailed discussion of the distribution of m5C-forming enzymes in the different kingdoms is given in Motorin et al. (2010).

Higher eukaryotes also have another distinct family of m5C-RNA-MTAses derived from former m5C:DNA-MTases (DNMT2-related family). These enzymes have different catalytic mechanisms but evolved to modify tRNAs at 38-position (Goll et al. 2006). For information on m5C-MTases acting on DNA, i.e., enzymes of the DNMT family, see elsewhere in this book.

2.3 Catalytic Mechanisms in the Formation of rT, m5C, and 5mC

The catalytic strategies employed for the alkylation of the carbon 5 in pyrimidines share some common elements, which derive from the heterocycle reactivity, as outlined above. Some basic elements already appeared in the discussion of the thymidylate synthase in above. In all cases, the Michael addition of an anionic nucleophile to the 6-position of the pyrimidine ring produces a nucleophilic carbon with partial carbanion character at 5-position (Fig. 4). In uridines, the Michael addition produces an intermediate, in which the negative charge is delocalized in an enolate structure. Arguably, this intermediate might be stabilized by a hydrogen bond of the enolate oxygen before reacting as a nucleophile with the carbon electrophile provided as a cofactor in the form of AdoMet or CH2-THF. In cytidine substrates, the mechanisms comprise an enamine intermediate instead of an enolate, and the mechanisms discussed in literature typically include an acidic residue in the catalytic site, which may intermittently protonate nitrogen 3 to stabilize this enamine intermediate. So far, the known enzymes acting on cytidines in both RNA and DNA exclusively use AdoMet as an electrophilic carbon source, while rT can be formed from either AdoMet or CH2-THF. This has interesting implications, namely, (i) that a cytidine methyltransferase using CH2-THF might so far have eluded detection and (ii) that the formation of rT has been invented multiple times with different cofactors in the course of evolution, in particular at 54-position of tRNA (Hamdane et al. 2012; Nordlund et al. 2000; Ny and Bjork 1980). Furthermore, with an eye to the more sophisticated U34 modifications occurring in tRNA, we note that a covalent enzyme-thiol-pyrimidine-methylene-folate intermediate (as reported/postulated for TS (Fig. 3) and TrmFO) can not only be resolved by a reduction with a hydride equivalent from the folate (Fig. 4(i)) but also with other nucleophiles (Helm and Alfonzo 2014) (Fig. 4 (ii)). The aromaticity of the pyrimidine base is restored in an elimination step featuring an abstraction of a proton from the C5, which regenerates the thiolate used in the initial activation step via Michael addition. In a number of RNA m5C:MTases, a second cysteine was reported to be crucial to this regeneration (reviewed in Motorin et al. (2010)).

Fig. 4
figure 4

Posttranscriptional ribouridine and cytidine methylation with the AdoMet cofactor. The first step uses a catalytic cysteine in the enzyme active site for activation of the C5 = C6 bond via formation of a covalent enzyme-RNA enolate intermediate. This allows the transfer of the CH3-group from AdoMet to the 5-position of the base, followed by deprotonation via Glu358 and the enzyme release from the covalent enzyme–RNA complex (Reviewed in Kealey et al. 1994)

In the framework of the above-described common elements, the various enzymes differ from one another by the amino acids that embody the different roles outlined above, such as activating nucleophile, general acid, general base, etc. In addition, the position of such residues, while relatively conserved in the spatial arrangement of the active site, may vary within the polypeptide sequence of the different enzymes. This has been especially well studied in m5C:MTases, where these residues are located within several conserved motives numbered I through X, which, in their order of appearance in the primary sequence, undergo permutation among the different enzymes of the bacterial and plant DNA-MTase family and some of the RNA-m5C:MTase (reviewed in Motorin et al. (2010)).

2.4 Catalytic Mechanisms in the Formation of Exotic U34 Modifications

As already mentioned several times, the most bewildering variety of 5-pyrimdine modifications are found at uridine 34 in the anticodon of tRNAs, where they play a crucial role in mRNA decoding. Alkylations predominate among these modifications, and some of the enzymatic mechanisms bear strong similarity with those applied in simple methylations. For example, certain enzymes use CH2-THF to transfer a formaldehyde equivalent to the C5, and instead of reducing it to the methyl group using a hydride donor, such as tetrahydrofolate (H4-folate) or FADH2 (Fig. 4), the amino group of certain amino acids such as taurine or glycine serves as the attacking nucleophile, leading to the structures displayed in Fig. 1c, which correspond to the overall product of a Mannich reaction (Helm and Alfonzo 2014). Recently, two groups discovered new types of reactive intermediates formed and employed in catalytic mechanisms of U34 modification. The Almo group reported the conversion of the conventional AdoMet into a novel derivative, carboxy-S-adenosyl-L-methionine, which is used by the bacterial CmoB enzyme to introduce the carboxymethyl into 5-hydroxyuridine (ho5U), yielding a 5-oxyacetyluridine (cmo5U) (Kim et al. 2013, 2015). Most interestingly, Huang’s group reported the use of AdoMet by a radical AdoMet enzyme from the elongator complex to generate a radical from the methyl group of the acetic acid moiety in acetyl CoA, which would then add to the C5–C6 double bond of uridine34 in Archaea and Eukarya (Selvadurai et al. 2014).

3 Functions of Alkylated Pyrimidine Nucleosides

In view of the plethora of different structures of alkylated pyrimidines already discovered, it is clear that there cannot be one function common to all of them. Indeed, new facets of functions are being continuously discovered in very diverse areas of molecular life sciences, and since this is not the focus of this chapter, we will only provide references to few known functions. One common biophysical property of 5-methylpyrimdines is that they enhance stacking in A- and B-helices of nucleic acids, leading to a structural stabilization that is typically reflected in an increased thermal stability detected in melting experiments. This applies to thymidine and 5-methylcytidine in DNA and RNA alike. The role of 5-methylcytidine in mammalian epigenetics, as well as in the restriction/methylation systems of bacteria, is detailed elsewhere in this book. Of interest, certain bacteriophages use particular 5-pyrimdmine modifications to escape bacterial restriction (Gommers-Ampt and Borst 1995). Curiously, the roles of ribothymidine and 5-methylribocytidine in RNA have remained little understood despite their long-standing tenure in the zoo of known RNA modifications (Motorin and Helm 2011). This is likely due to the fact that their principle occurrences in tRNA and rRNA concern heavily modified RNAs, where a plethora of modifications cooperate in a network fashion to modulate RNA activity (Motorin and Helm 2010). The role of U34 modifications in tRNA has already been alluded to, although the generic explanation of mRNA decoding on the ribosome does not do justice to the plethora of structures found here. Apparently, there is no universally perfect modification at this site that suits all organisms, and the variety of conditions under which protein synthesis must take place has led to the emergence of numerous chemical solutions in different species. Along this line, the recent findings that tRNA anticodon modifications are dynamically responding to stress conditions point to an especially sensitive environment that is subject to constant tuning and further evolution.