Keywords

1 Introduction

Aminoacyl-tRNA synthetases bridge the transition from RNA to protein during translation by correctly pairing a particular tRNA with its cognate amino acid to be incorporated into a growing peptide chain by the ribosome. How aaRSs contributed to the transition from an RNA-only world to one involving protein synthesis has been the subject of considerable conjecture. Polypeptide formation programmed by a genetic code likely existed early in evolution and the earliest process of translation arose at a time when functions that predate present-day proteins were being performed by other, protein-free systems. Similar to the largely RNA directed functions of the ribosome, there is a general consensus that aminoacyl-tRNA synthesis may have begun as a process catalyzed by RNA [13]. Many parts of the ribosome and its factors resemble mini RNA helices, which could have served as substrates for many of the aaRSs [48]. Aminoacyl-minihelices can be used by contemporary ribosomes, providing a link between a strictly RNA world and one involving protein synthesis [9]. It is also thought that the early genetic code was comparatively simple, perhaps consisting of only a few amino acids, and the first aaRS functionalities possibly emerged as an early part of the primordial protein synthesis machinery. As the protein code became more complex the synthetases, along with tRNAs, separated from the early ribosome leading to a precursor more similar to the contemporary protein synthesis machinery [10].

Present day aminoacyl-tRNA synthetase enzymes are universally required for protein synthesis in all organisms and show a high degree of evolutionary conservation. AaRSs have undergone extensive horizontal gene transfer, particularly during early evolution, and such events have been identified across all taxonomic levels and are well supported by phylogenic evidence [11]. Synthetases apparently emerged before the tree of life evolved into three domains, during the time of the last universal common ancestor (LUCA), as can be seen by the universal distribution of aaRSs across all branches of life. Additionally, aaRS phylogenies provide evidence that each family of synthetase was present in LUCA [12]. Ancient emergence of aaRSs as well as early and abundant gene transfer events led to deep lineages and low sequence similarity between modern synthetase homologs [13]. Additionally, the evolution of the synthetase enzyme family has included gene duplications and the subsequent emergence of numerous paralogs with new functions (Sect. 5).

Several studies suggest a correlation between how the aaRSs are evolutionarily related to each other and the overall amino acid order or structure of the genetic code. It is not generally believed that the evolution of the aaRS family is responsible for shaping the genetic code, but rather is somewhat converged with the evolution of the code as both are driven by similar properties of the corresponding amino acids [12, 13]. The evolution of the universal genetic code for the 20 canonical amino acids likely occurred early in the history of life and appears to predate the emergence or distinction between class I and class II aaRSs, as the aaRSs had already evolved amino acid specificities by the time of the LUCA. This order of events further suggests that the extant aaRS aminoacylation machinery may have somehow displaced an ancient, now extinct, aminoacylation process [11, 14].

The contemporary aminoacyl-tRNA synthetase enzymes are modular in structure and contain a core catalytic domain responsible for ATP dependent aminoacyl-adenylate formation and subsequent ester bond ligation of the activated amino acid onto the 3′ ribose of tRNA. In addition to the core catalytic domain, each enzyme contains a variety of other modules that function primarily to maintain translational accuracy via substrate and tRNA binding and recognition as well as proofreading of misacylated products. In some instances other domains appended to synthetases are responsible for activities outside of tRNA charging including transcriptional and translational regulation, DNA replication, and cell signaling (Sects. 4 and 5). This chapter will focus on the conventional classification of this large enzyme family, as well the emergence of non-canonical aaRSs, alternative pathways for tRNA aminoacylation and aaRS homologs and paralogs that function both in and apart from tRNA aminoacylation. How further adaptive evolution has allowed for widespread adjustments to the core functions of aaRSs providing selective advantages specific to different organisms and their environments will also be discussed.

2 The Aminoacyl-tRNA Synthetase Class System

AaRSs are grouped into two unrelated structural classes based on the conserved architecture of the catalytic core domains of the enzymes [15, 16]. This classification is independent of other appended domains, with roughly half the aaRSs in each of the two classes (Table 1). The categorization of synthetases into one of two classes is almost completely universal across all three domains. In other words, the synthetase class assignment for a particular cognate amino acid is the same regardless of the origin of the enzyme. There is one recently discovered exception, LysRS,Footnote 1 which is found in both classes (Sect. 3.1). Although the two aaRSs classes are evolutionarily and structurally very different from each other, the overall chemistry of the tRNA aminoacylation reaction is similar in both (Fig. 1). The existence of two entirely unrelated classes of synthetases provides a strong example of convergence, with two independent structural solutions evolving to achieve the same enzymatic goal; efficient aminoacylation of tRNA [17]. It has also been suggested that the existence of two independent aaRS classes is indicative of multiple origins of protein synthesis where each used its own set of amino acids and then the two systems fused, or possibly competed, to form eventually one translation process [11].

Table 1 Classes of aminoacyl-tRNA synthetases and structures
Fig. 1
figure 04231

Two-step aminoacylation reaction by aaRSs. In the first step of the reaction, nucleophilic attack by the α-carboxylate carbon of the amino acid on the α-phosphate of ATP leads to the formation of an enzyme-bound aminoacyl-adenylate and pyrophosphate release. The second step of catalysis involves transfer of the aminoacyl-adenylate to tRNA. Nucleophilic attack by either the 2′ or 3′ OH of A76 of the tRNA (depending on class) on the α-carbonyl carbon of the aminoacyl-adenylate and subsequent release of AMP drives the tRNA transfer step of the reaction

A docking study of aaRSs on the tRNA acceptor stem found that subclasses of synthetases within each of the two major classes correlate to each other with respect to amino acid specificity [18]. Because two synthetases from the two different classes normally bind the tRNA acceptor stem from opposite sides, it is possible to model the simultaneous docking of two aaRSs onto a single tRNA. Upon doing this, the authors found specific class I/class II aaRS pairs that could be docked without creating major steric clashes and this pairing occurs because the position of the active site in relation to the tRNA acceptor stem varies for aaRSs within each subclass. For each pair of class I and II aaRSs that were co-docked, the corresponding amino acids had similar structural and steric characteristics. From these results, the authors propose the possibility that ancestral aaRSs with the same amino acid specificities evolved into two separate enzymes with different class architectures, but the same amino acid substrates. LysRS provides a modern example of this, where the amino acid substrates are exactly the same for two enzymes of different aaRS classes [19, 20]. In most cases, however, the aaRS pairs have evolved divergent amino acid specificities and new codons designated to distinguish similar amino acids from each other [18].

2.1 Class I aaRS

The catalytic domains of class I synthetases are structurally very similar to each other and contain a conserved Rossmann fold domain. This conserved domain is responsible for nucleotide binding and contains two conserved sequences, “HIGH” and “KMSKS” located near the α-phosphate of the bound ATP (Fig. 2) [21]. Also unique to the class I synthetases is the attachment of the activated amino acid at the 2′ OH of tRNA. Class I aaRSs bind the tRNA acceptor helix on the minor groove side and, although many of the synthetases in class I are capable of aminioacylating both the 2′ and 3′ OH of their respective tRNA, the 2′ OH is a much more efficient target for catalysis [22, 23]. Release of the aminoacyl-tRNA is rate limiting for many members of this class of synthetase [2426]. It should be noted that the analysis of subclass grouping of both class I and class II synthetases can vary slightly based on the structures and sequences available for phylogenic comparison [18, 21, 27, 28].

Fig. 2
figure 04232

Active site domains of (a) class I aminoacyl-tRNA synthetase, e.g., GlnRS, and (b) class II aminoacyl-tRNA synthetase, e.g., AspRS. Shown are ATP and the acceptor ends of cognate tRNAs (red). The locations of the characteristic motifs are indicated: in (a), MSK (dark blue), HIGH (red): in (b), motif 1 (red), motif 2 (light blue), and motif 3 (dark blue) (reprinted from [17], Copyright 1997, with permission from Elsevier Science)

There are three class I synthetase subclasses. Subclass Ia contains synthetases whose cognate amino acids have hydrophobic aliphatic groups or are sulfur containing. This group shares a well conserved overall structure, including a common tRNA stem-contact fold (SC-fold), an α-helical anticodon binding domain, and a connective peptide I (CPI) domain which is a globular insertion domain located between two parts of the nucleotide binding fold. The CPI domain is attributed to the post-transfer editing function, or proofreading, of misacylated tRNAs in three of the class Ia synthetases, LeuRS, IleRS, and ValRS [2931] reviewed in [32]. Phylogenetic reconstruction supports previous theories that these three closely related synthetases, which are all cognate for aliphatic amino acids and charge tRNAs that decode NUN codons, arose from a common ancestor which was unable to discriminate between Leu, Ile, and Val [13, 33].

Subclass Ib includes ArgRS, GluRS, GlnRS, and an atypical class I LysRS, all of which recognize cognate charged amino acids. Interestingly these four synthetases are the only exceptions to the distinct two-step reaction, as they all require cognate tRNA binding before catalysis of the pyrophosphate exchange reaction occurs [3436]. GlnRS likely evolved from GluRS and both are involved in the formation of Gln-tRNA either directly or indirectly depending on the particular organism or organelle (discussed below). GluRS in eukaryotes is more closely related to GlnRS than it is to GluRS in bacteria, suggesting GlnRS emerged from a gene duplication of an ancestral eukaryl GluRS that was then transferred to bacteria [37] reviewed in [38].

The third subclass, Ic, contains the structurally similar TrpRS and TyrRS, both of which have cognate aromatic amino acid substrates. The subclass 1c enzymes are both dimers, in contrast to all other class I synthetases that function as monomers. Based on early sequence comparisons that showed TrpRS and TyrRS from eukaryotes and Archaea were more similar to each other than compared to their counterparts in eubacteria, it was proposed that Trp and Tyr were added more recently to the genetic code with their cognate synthetases diverging after the three domains of life had split [39]. More recent analyses, however, have shown that TrpRS and TyrRS form two monophyletic groups, more in line with other synthetase evolutionary groupings and supporting a much earlier gene duplication event [40, 41].

2.2 Class II aaRS

Class II synthetases tend to work as multimers, as most are homodimers while some forms of PheRS, AlaRS, and the bacterial form of GlyRS function act as tetramers. Class II aaRSs are much less conserved than class I and have a structurally distinct catalytic core that is made up of a characteristic seven-stranded antiparallel β-sheet surrounded by a number of α-helices (Fig. 2) [21]. There are three loosely conserved sequence motifs (1,2,3) found in class II synthetases; motif 1 is found at the dimer interface while motifs 2 and 3 participate in substrate binding in the catalytic site. In contrast to class I, class II aaRSs bind the acceptor helix of the tRNA from the major groove side and generally attach the activated amino acid to the 3′ OH of tRNA. The only exception to this last point is PheRS which, like a class I synthetase, aminoacylates the 2′ OH of tRNA.

Subclasses of class II aaRS are defined by differences in primary sequence, subunit organization (dimer, heterodimer, etc.), and location and composition of the anticodon-binding domain. HisRS, ProRS, SerRS, and ThrRS are usually grouped as subclass IIa synthetases. These enzymes have the canonical class II catalytic site and are grouped together due to the similarity in the sequences of their C-termini. With the exception of SerRS, the synthetases within this group have similar C-terminal tRNA anticodon binding domains, which contain an α/β fold responsible for recognizing determinants in the tRNA anticodon loop [16]. Interestingly, this domain is also found in the archaeal/eukaryotic type GlyRS (see below).

Subclass IIb is composed of three synthetases, AspRS, AsnRS, and LysRS, that share several regions of sequence and structural homology indicating these enzymes all originated from a common ancestor. The structural organization of the subclass IIb is highly conserved, in particular the presence of an oligonucleotide binding (OB) fold containing an N-terminal extension that acts as an anticodon-binding module, which contacts tRNA on the minor groove side of its anticodon stem [4244]. The anticodon stem loops of the cognate tRNAs of the class IIb synthetases all have a conserved central uracil base which makes two contacts with the aaRS [16].

The class IIc synthetases AlaRS, GlyRS, PheRS, PylRS, and SepRS only contain class II motifs 2 and 3 and have less well conserved amino acid and tRNA binding elements than other class II aaRSs. The members of this aaRS subclass mostly exist as tetrameric structures, as opposed to dimers like the other two class II subgroups [45]. There are some exceptions to this such as mitochondrial PheRS, which is a monomer lacking the β-subunit and editing domain [46]. Two forms of GlyRS exist, a homodimer found in archaea, eukaryotes, and some bacteria and a heterotetramer found only in bacteria (see Table 1). The two forms of GlyRS are unrelated in both sequence and structure and the heterotetramer form is not closely related to any of the other class II aaRS [16]. The distribution of GlyRS types in different bacteria does not correlate with the evolutionary emergence of these bacteria [47]. GlyRS is a clear example of a synthetase that does not follow the rule of one conserved aaRS across all domains for each amino acid [11].

2.3 Examples of Where aaRSs Are Missing from Particular Genomes

Not all organisms have a full set of 20 canonical aaRSs to synthesize aa-tRNA from all 20 canonical amino acids. Initial analyses of complete archaeal genomes revealed missing open reading frames encoding several synthetases, complicating the understanding of tRNA aminoacylation at that time [149, 150]. Subsequent studies showed that previously unknown aaRSs and indirect aminoacylation pathways are prevalent in archaea and bacteria. The indirect aminoacylation pathways involve a non-discriminating (ND) synthetase with expanded specificity to form a mischarged canonical amino acid-tRNA pair, which is then further modified by RNA dependent enzymes, changing the tRNA-bound amino acid.

2.3.1 AsnRS and GlnRS

The aaRSs most often missing from certain organisms are those for the direct aminoacylation of Asn-tRNAAsn and Gln-tRNAGln. There is no known GlnRS encoded in any sequenced archaeal genome, and most bacterial genomes and eukaryotic organelles also lack GlnRS. Additionally, many archaea and prokaryotes do not contain an AsnRS [28]. For organisms lacking these aaRSs, Glu-tRNAGln or Asp-tRNAAsn is first formed by ND-GluRS or ND-AspRS, respectively. The mischarged tRNA species is then amidated by the appropriate amidotransferase (AdT), requiring ATP as well as an amide source (Fig. 3) [151153]. Structural and biochemical data suggest aminoacylation and amidation enzymes are able to form a complex known as the transamidosome, which provides channeling of substrates [154157]. A more recent study has now shown that formation of a transamidosome is not essential in all cases as rapid kinetic channeling of intermediates can still occur without direct protein association [158].

Fig. 3
figure 04233

Direct aminoacylation vs the transamidation pathway for Asn-tRNAAsn formation. In species that encode AsnRS, Asn-tRNAAsn formation occurs directly (top). Transamidation (bottom) involves Asp-tRNAAsn formation using a non-discriminating AspRS (ND-AspRS). Asp-tRNAAsn is then converted to Asn-tRNAAsn with an amidotransferase (Adt). Indirect aminoacylation of Gln-tRNAGln occurs similarly, using ND-GluRS and Glu-AdT

Two different, but related, tRNA-dependent AdTs exist, GatCAB and GatDE. The presence of a particular form and its activity in vivo varies depending on the domain of life as well as whether one or both GlnRS and AsnRS are missing [151], reviewed in [159]. For example, the GatCAB AdT functions both as a Glu-AdT and Asp-AdT, while GatDE functions strictly as a Glu-AdT. GatCAB is found in both bacteria and archaea, but only archaea lacking AsnRS. GatDE is found only in Archaea. Recently it was shown that a unique situation exists in yeast where the cytoplasmic GluRS is imported into the mitochondria and functions there as the ND-GluRS that generates mitochondrial Glu-tRNAGln [160]. This charged tRNA substrate is then converted to Gln-tRNAGln using a novel trimeric Adt, GatFAB.

The transamidation pathways likely evolved by the adoption of Asn and Gln biosynthesis pathways by the aminoacyl-tRNA formation machinery [161]. For example, the GatD domain of the Archaeal Glu-AdT originates from an asparaginase, and the GatA domain of the multi-domain AdT is related to amidases responsible for amide bond cleavage [153, 159]. GlnRS and AsnRS were not present in LUCA, and therefore it is likely that Gln-tRNAGln and Asn-tRNAAsn were first formed by indirect pathways. Where these synthetases do appear, the phylogenies lack any typical patterns, further supporting their recent origin [11, 28]. The fact that so many organisms have not acquired the appropriate aaRS for amide amino acids and have lost the corresponding Adts may reflect the essential role of amidotransferase enzymes in metabolism. For example, Gln is a major source of amides for many biosynthetic pathways. Also, most bacteria that have acquired AsnRS still have an indirect Asn-tRNA formation pathway, which is used as the only source of Asn biosynthesis in these organisms [162, 163].

2.3.2 Formylmethionyl-tRNA

Another example of an aaRS “missing” from genomes involves a unique aa-tRNA that is needed to initiate protein synthesis in bacteria, mitochondria, and chloroplasts. This tRNA, formylmethionyl-tRNAfMet, is aminoacylated indirectly as there are no genes encoding an fMetRS known to date. First, aminoacylation of initiator tRNAfMet with methionine by methionyl-tRNA synthetase (MetRS) occurs followed by formylation of the methionine moiety by methionyl-tRNA transformylase [164, 165]. The initiator tRNA contains sequence elements and modifications that distinguish it from elongator tRNAMet and helps it evade binding to elongation factors and instead bind directly to the ribosomal P site with the help of initiation factors (reviewed in [166]). In Trypanosoma brucei mitochondria Met-tRNAMet is imported from the cytoplasm and a fraction is then formylated and used for translation initiation [167]. The formyl modification of methionine is important for the initiator tRNA to function in translation, as it is specifically recognized by bacterial initiation factor 2 (IF2), ensuring the appropriate tRNA is in place for initiation [168].

2.3.3 Selenocysteine-tRNA

The amino acid selenocysteine (Sec) is found in all three domains of life, but not in all organisms, and was the first discovered outside of the original 20 amino acids encoded by the universal genetic code. However, no SecRS or enzyme able to aminoacylate directly tRNASec with Sec has been identified. Selenocysteine is similar to cysteine, the difference being the thiol group is replaced by a selenium-containing selenol moiety. Selenol has a lower redox potential and a lower pK a than a thiol group and is ionized and more reactive at physiological pH. Proteins that contain Sec are often enzymes involved in redox reactions and these Sec residues are most often found within the active site [169]. Sec is formed from serine after tRNA charging and before polypeptide insertion. SerRS first aminoacylates tRNASec with serine (Ser) and Ser-tRNASec is then converted to Sec-tRNASec by the enzymes selenocysteine synthase (SelA) in bacteria and O-phosphoseryl-tRNA kinase (PSTK) followed by Sep-tRNA:Sec-tRNA synthase (SepSecS) in eukaryotes and Archaea (Fig. 4) [170, 171]. Similar to tRNASer, tRNASec species contain particularly long variable arms, a conserved structure of these tRNAs needed for SerRS recognition (reviewed in [172]). However, the structure of tRNASec is sufficiently different that aminoacylation of this tRNA is less efficient than that of the cognate tRNASer [173, 174]. Incorporation of Sec into the growing peptide occurs at particular stop codons (UGA), which are identified by a nearby cis element – a stem-loop structure in the mRNA (bacteria) or a structure in the 3′ untranslated region (archaea and eukaryotes) [175177]. An additional RNA binding protein is needed to recognize the cis element in the RNA to signal the translational machinery for proper encoding of Sec. Unique elongation factors (SelB in bacteria and eEFSec in eukaryotes) replace the function of EF-Tu and deliver Sec-tRNASec to the ribosome. The details of these unique mechanisms of ribosomal decoding in archaea and eukaryotes are still under investigation [178, 179]. The fact that the incorporation of selenocysteine into proteins required the development of an alternative route, rather than addition of a new synthetase and a simple change to the existing code, supports the notion that the contemporary genetic code and existing amino acid set are difficult to change as they are to some extent constrained by amino acid metabolism [180].

Fig. 4
figure 04234

Indirect Sec-tRNASec formation. SerRS first aminoacylates tRNASec with serine (Ser) and then Ser-tRNASec is converted to Sec-tRNASec by the enzymes selenocysteine synthase (SelA) in bacteria and O-phosphoseryl-tRNA kinase (PSTK) followed by Sep-tRNA:Sec-tRNA synthase (SepSecS) in eukaryotes and Archaea. Both of these enzymes are dependent on a selenium donor and pyridoxal phosphate (PLP). PSTK also requires ATP

2.3.4 CysRS

In several methanogenic archaea, including M. jannaschii and M. thermoautotrophicum, CysRS is not present and the mechanism of Cys-tRNACys formation in these organisms was initially unclear [181]. Initial biochemical studies suggested that Cys-tRNACys was formed in these organisms by a prolyl-tRNA synthetase (ProRS) with a dual specificity for both Pro and Cys [182]. However, it was subsequently shown that the absent CysRS is actually replaced by the activity of O-phosphoseryl-tRNA synthetase (SepRS), a newly discovered synthetase which will be discussed below. This synthetase forms Sep-tRNACys, which is then converted to Cys-tRNACys by Sep-tRNA:Cys-tRNA synthase (SepCysS) [183]. The mechanism for Cys-tRNACys synthesis is similar to Sec-tRNASec synthesis in archaea. Sep-tRNA is the intermediate in both pathways and serves as a substrate for either SepSecS or SepCysS, and the two enzymes share many similarities [159]. Additionally, it has been proposed from phylogenetic studies that the indirect pathways for Cys-tRNACys formation and Sec incorporation in bacteria, Archaea, and eukaryotes were all present at the time of LUCA [171, 184].

3 Non-canonical Aminoacyl-tRNA Synthetases

In addition to the 20 well-characterized canonical aaRSs, there exist several recently discovered enzymes that either fall outside the normal class rules or charge tRNA with amino acids that are not among the 20 encoded by the universal genetic code. Phylogenetic analyses show that these enzymes likely arose early in the evolution of aaRSs and were not retained in most organisms. In most cases they only still appear in small groups of archaeons and dispersed bacteria [144].

3.1 LysRS I

The only known synthetase to date that breaks the class rule and contains enzymes in both class I and class II is LysRS. Class I LysRS (LysRS1) was discovered relatively recently [19] and is found mostly in Archaea, a few dispersed bacteria, and no eukaryotes. Class II LysRS (LysRS2), however, is found in eukaryotes and most bacteria. Most organisms contain one class of LysRS or the other, with the exception of the archaeal group, Methanosarcinaceae, and a few isolated species in other genera, such as Nitrosococcus oceani [185] and Bacillus cereus [186, 187] where both LysRS genes are present [188]. The existence of the same aaRS in two distinct structural classes provides an example of convergent evolution in which divergent mechanisms achieve the same functional goal. In this case, the end result of each enzyme’s emergence is the formation of lysylated tRNALys. LysRS2 likely existed prior to LysRS1 as it demonstrates deep evolutionary connections to AspRS and AsnRS based on sequence and phylogenetic analyses [11]. Similar phylogenetic associations are lacking between LysRS1 and any other extant synthetase. LysRS1 emerged relatively early in the archaeal lineage and horizontal gene transfer to a few bacteria appears to have come from a pyrococcal progenitor [189]. LysRS1 enzymes from different domains are not deeply rooted and do not group together, whereas other enzymes that are present in both archaea and bacteria do. This LysRS1 distribution pattern is consistent with recent horizontal gene transfer events that possibly occurred more than once [11, 36]. There is a robust correlation between the phylogeny of class I LysRS sequences and the distribution of AsnRS, which may reflect competition for overlapping anticodon sequences during tRNA recognition [190].

Sequence and structural comparisons indicate some distant relationship between LysRS1 and the class I synthetases CysRS, ArgRS, GluRS, and GlnRS (see above) and, similar to three of these synthetases, LysRS1 requires binding of tRNA for formation of the aminoacyl-adenylate [20, 3436]. Structural and functional data suggest tRNALys anticodon recognition by LysRS1 requires fewer interactions than by LysRS2, supporting a less significant role of the anticodon in tRNA recognition by the class I enzyme [191]. LysRS1 has an alpha-helix cage anticodon binding domain, which is similar only to GluRS, suggesting tRNALys anticodon specificity may have evolved from the analogous domain of an ancestral GluRS enzyme [191]. In addition to differences in tRNA recognition, LysRS1 and LysRS2 also show divergent resistance to near-cognate amino acids, which may have also impacted the retention of a particular form of the enzyme in different lineages. Lysine recognition differs between the two enzymes and specificity is greater in the LysRS1 active site compared to that of LysRS2, which is more catalytically efficient [192194]. The need for either strong active site discrimination or efficient catalysis likely depends on the organism and the environment in which it lives, leading to variations in the pressure to retain a particular form of LysRS encoded in a genome.

3.2 Pyrolysyl-tRNA Synthetase (PylRS)

Although natural proteins contain more than 140 different amino acids, the majority of these are the result of posttranslational modifications that occur after protein synthesis [195]. There are only two known additions to the standard 20 amino acid set that are decoded during protein synthesis. These two non-canonical amino acids are selenocysteine (Sect. 2.3.3) and pyrrolysine which, unlike selenocysteine, exists as a free metabolite that requires a unique aaRS to charge it directly onto tRNA. Pyl is encoded in proteins often needed for methylamine utilization and was first identified in a group of archaeal methanogens [196]. More than 20 Pyl-decoding organisms have been identified with roughly half of these being archaeal methanogens of the Methanosarcina family and the rest diverse species of bacteria including Acetohalobium arabaticum, Desulfitobacterium hafniense, Desulfitobacterium dehalogenans, and a symbiontic δ-proteobacterium bacteria of the worm Olavius algarvensis, [197, 198].

The mechanism of pyrrolysine insertion into proteins was initially not clear, and thought to require a modification of Lys-tRNAPyl, which can be formed by the class II LysRS [199]. Additional in vitro studies showed that tRNAPyl is efficiently aminoacylated with Lys in the presence of both class I and class II LysRSs of Methanosarcina barkeri [200]. More recently, however, the use of in vitro synthesized pyrrolysine demonstrated that the dedicated tRNA synthetase, pyrrolysyl-tRNA synthetase (PylS), is responsible for charging tRNAPyl and is unable to use lysine as a substrate [201, 202]. The formation of Pyl-tRNAPyl and the production of Pyl-containing proteins have been investigated for a handful of pyrrolysine-encoding organisms and this amino acid is found to be inserted into certain proteins at specific UAG stop codons. Although Pyl-tRNAPyl is recognized by EF-Tu without the help of trans-acting factors, a downstream pyrrolysine insertion sequence (PYLIS) promotes incorporation of pyrrolysine over translation termination [201, 203]. Therefore pyrrolysine and selenocysteine insertion are similar in that both require cis elements for ribosomal decoding.

The carboxy-terminus of PylRS resembles a typical class II catalytic domain; the amino-terminal domain, however, looks somewhat different compared to other canonical synthetases and is responsible for tRNAPyl recognition [204]. The genes encoding the carboxy- and amino-termini of PylRS are separated by two genes in the bacterium D. hafniense, which differs from the archaeal PylRS-encoding gene arrangement [205]. The amino-terminus of archeal PylRS is dispensable in vitro but required in vivo [206]. The D. hafniense PylRS structure demonstrates how the tRNA binding surface is well conserved between all PylRSs and results in an aaRS–tRNA interaction surface that is distinct from those observed in other known aaRS–tRNA complexes [146]. This is thought to be due to the early emergence of PylRS which led to the evolution of unique structural features in both the protein itself and tRNAPyl. Based on the Archaeal M. mazei structure and phylogenetic analysis, PylRS is considered to be a class IIc aaRS along with GlyRS, PheRS, and AlaRS. With the exception of GlyRS, all the synthetases in this subclass share a homologous quaternary architecture; thus it is possible that Pyl exists as a tetramer as well. Although the structural results show a dimeric PylRS bound to two tRNAPyl molecules, modeling of a potential PylRS tetramer shows conserved residues along the interface of the tetramer, suggesting this is the correct oligomerization state [144]. These structural studies were also successful in deducing the amino acid binding pocket of PylRS, which contains a deep hydrophobic pocket for Pyl binding. The specificity elements of PylRS for its substrates are residue side chains that extend into the amino acid binding pocket. This mode of recognition enables the development of aaRSs that can aminoacylate novel amino acids and arise either by evolution, as with Pyl, or by enzyme design experiments [144, 146].

Although PylRS is an uncommon synthetase with a distribution limited to a small subset of organisms, phylogenetic analyses link its emergence with other class II aaRSs prior to the LUCA [13]. Because the insertion of Pyl into proteins is seen for only a small number of disperse species, it is predicted that the Pyl encoding operon was likely acquired by ancient horizontal gene transfer events between now extinct groups that had a greater use for this amino acid [13, 207, 208]. These gene transfer events were then followed by limited retention of the Pyl encoding operon in extant organisms. Interestingly Pyl is synthesized solely from Lys, connecting amino acid metabolism and synthetase evolution [209]. Pyl insertion at UAG codons is regulated differently in archaea versus some of the bacterial examples looked at thus far. Pyl-decoding archaea constitutively encode Pyl and have adapted to this by having fewer TAG codons in their genes, whereas bacteria that use Pyl, such as Acetohalobium arabaticum, regulate Pyl encoding at the level of transcription of the Pyl operon under particular growth conditions [198].

3.3 Phosphoseryl-tRNA Synthetase (SepRS)

In organisms that lack CysRS, another, non-canonical synthetase has been found responsible for indirect aminoacylation of Cys-tRNACys. Initially it was unclear how Cys-tRNACys was formed in CysRS lacking organisms and a dual specific ProRS was thought to be responsible (Sect. 2.3.4). Since then, a non-canonical class II aaRS, O-phosphoseryl-tRNA synthetase (SepRS), was found in most methanogenic archaea and is responsible for charging tRNACys with o-phosphoserine (Fig. 5). The o-phosphoseryl-tRNACys intermediate is then further modified by Cys-tRNA synthase (SepCysS) [183].

The SepRS/SepCysS genes are only found in archaeal genomes that contain the methanogenesis genes for generating or oxidizing methane [184]. This linkage suggests a strong evolutionary connection between the indirect Cys-tRNACys pathway and methanogenesis. This tRNA dependent indirect pathway is also the sole method of free cysteine biosynthesis in some organisms, and, in the case of Methanosarcina mazei, cysE, one of the bacterial genes for cysteine biosynthesis, was apparently lost while the more ancient SepRS/SepCysS system was retained [184, 210]. In a few organisms, namely several Methanosarcina species, genes for both the traditional class I CysRS and the indirect SepRS/SepCysS tRNA-charging pathway exist [184]. It appears that both pathways have physiological significance and depend on differing selectivity of various tRNACys isoacceptors with the help of particular tRNA modifications. However, the exact role of the redundancy in Cys-tRNACys formation in some organisms remains unclear, but is likely closely linked to sulfur and energy metabolism in methanogens [210]. As more is revealed about these unique aminoacylation systems, evolutionary links between protein synthesis, amino acid synthesis and cellular metabolism will become more apparent.

Fig. 5
figure 04235

Indirect Cys-tRNACys formation. SepRS first aminoacylates tRNACys with O-phosphoserine (Sep) and then Sep-tRNACys is converted to Cys-tRNACys with Sep-tRNA:Cys-tRNA synthase (SepCysS) in the presence of a sulfur donor and PLP

SepRS has a very ancient lineage, stemming at least as far back as the origin of the archaeal branch. Phylogenetic analysis indicates that, although both PylRS and SepRS evolved much before LUCA, these enzymes were only retained in a handful of organisms, demonstrating the unique metabolic requirements for these amino acids. Alternatively, the emergence of GlnRS and AsnRS in some organisms occurred post-LUCA [11], replacing the more primitive indirect charging pathway for Asn and Gln, which are required in the proteomes of all organisms. Interestingly, both PylRS and SepRS are classified as class IIc synthetases and appear to be distant homologs of PheRS. Phylogenetic evidence shows SepRS evolved from α-PheRS, while PylRS evolved much earlier, before the differentiation of PheRS into a heterotetramer, and likely evolved from an ancestral PheRS as a result of gene duplication [144, 184].

Phylogenetic evidence suggests synthetases evolved after the genetic code was established [11, 14] and therefore, not surprisingly, Sep and Pyl, which are not encoded directly by the code, emerged from an earlier evolved synthetase and required some flexibility of the existing code rather than expansions of the genetic code itself. The discovery of these additional aaRSs and tRNA charging pathways suggests that with further knowledge of uncharted organisms, in terms of sequence and proteome composition, other unidentified aaRSs might exist. Such discoveries could expand the genetic code beyond the current 22 amino acids or uncover new pathways for tRNA aminoacylation. The roles of pyrrolysine in proteins required for methanogen growth on methylamines and selenocysteine in enzymes requiring strong redox capacities indicate the evolutionary selective pressures that underlie the retention of these non-canonical amino acids. Although Crick’s adaptor hypothesis [211] is satisfied partially by the discovery of 20 distinctive aaRSs, his later proposed “frozen accident” theory [212] is not. This theory states that the genetic code is “frozen” and that any changes to it would be strongly selected against, if not lethal. The non-canonical examples discussed here demonstrate the code is not “frozen” because these amino acids emerged after establishment of the genetic code and in certain organisms the capacity for coding and incorporation of these amino acids has been lost.

4 Functional Evolution of Synthetases

The early process of deciphering the genetic code for protein synthesis was almost certainly more ambiguous than in extant organisms, likely involving incorporation of a particular “type” of amino acid at codons [213, 214]. Therefore some of the earliest proteins may have been defined more by the general properties of their chemical makeup rather than by the presence of specific chemical groups at specific locations. Aminoacylation accuracy by modern synthetases is challenged by the similarity between many of the substrates used by each of the 23 known aaRS enzymes. The structural and chemical similarities between tRNAs present challenges for accurate recognition of the correct isoacceptors, but even more problematic is the high similarity of some amino acids, some of which can vary by a single methyl group as is the case for isoleucine and valine. Despite these similarities, most aaRSs have a misacylation rate of less than 1 in 5,000 [215]. Aminoacyl-tRNA synthetases have adapted to maximize substrate specificity in order to maintain or control fidelity during protein synthesis. Two different mechanisms are responsible for ensuring highly accurate substrate recognition in different aaRSs. The first depends on the high specificity of the particular enzyme for its amino acid and tRNA substrates. The second way aaRSs can achieve higher substrate specificity, and therefore greater accuracy for protein synthesis, is through editing non-cognate amino acids.

4.1 Specificity of Synthetases

Faithful translation at the level of aaRSs starts with proper identification and pairing of particular tRNAs with their cognate amino acid. Synthetase specificity, or how each enzyme selects the correct tRNA isoacceptors for its cognate amino acid, is often referred to as the second genetic code [216]. Specificity of the aminoacylation reaction is largely dependent on proper recognition of the cognate amino acid and proper tRNAs from the large cellular pool of metabolites and isoacceptors, respectively. Aminoacyl-tRNA specificities can vary between synthetase variants and in some cases appear to have evolved in order to adapt to the particular environments of organisms or the properties of individual cellular compartments [217219]. Progress in genetic, structural, and biochemical studies has helped shape the underlying principles behind tRNA recognition and amino acid selection of aaRSs, and has provided insight into how these enzymes have adapted to specific evolutionary forces [214, 220, 221].

4.1.1 tRNA Recognition

The primary force for tRNA–aaRS binding is displacement of bound water molecules by the phosphate backbone of the tRNA, and therefore the initial binding event is somewhat non-specific. Structural modeling studies of PheRS, ThrRS, and IleRS showed how electrostatic interactions contribute to the first stages of tRNA binding [222]. It was determined that positive patches on these aaRSs, formed by non-conserved interaction residues, and supplementary domains are most important for determining the long-range potential of the enzyme. These regions are unrelated to the conserved catalytic motifs of aaRSs and determine the ability to attract the tRNA molecule from a distance and direct it to its binding site.

After long distance interactions are made between an aaRS and tRNA, more specific recognition at short distances occur and rely strongly on the conserved catalytic modules. Short distance binding and recognition are also established by similar structural determinants of tRNAs. Differential binding affinity is not sufficient to ensure the correct recognition of the cognate tRNA and therefore kinetic discrimination is used to overcome these limitations and help the aaRS distinguish between cognate and non-cognate tRNAs. Aminoacylation of the correct tRNA is influenced more by k cat effects than by K m effects [223]. Through various structural and pre-steady state kinetic studies of several tRNA–aaRS pairs, a general model of tRNA binding and recognition has been elucidated [224226]. The first stage of tRNA binding is fast and the thermodynamic stability of this initial complex depends on interactions with the anticodon or variable arm. These close interactions are followed by a slow conformational change and accommodation step that occurs only when the cognate tRNA is bound. Interactions with the acceptor stem of the cognate tRNA are important for this accommodation and facilitating an efficient rate of aminoacylation and transfer. The precise details of tRNA binding likely vary somewhat even within each class of aaRS as tRNA binding determinants and structural motifs vary between different tRNA–aaRS pairs.

Transfer RNA identity elements are necessary for proper recognition of a particular group of isoacceptors by an aaRS. Some of these elements are positive determinants that promote binding of the cognate tRNA and some are negative (anti-determinants) that prevent acceptance of a non-cognate tRNA [221]. For both class I and class II isoacceptors recognition elements are located on the periphery of the tRNA, in the acceptor arm, and in most cases the anticodon stem loop. Major discrimination occurs at N73, distal base pairs of the acceptor arm, and base 35 of the anticodon stem loop. The anticodon region is not essential for aaRS–tRNA recognition in the three Escherichia coli tRNAs specific for Leu, Ser, and Ala. In the case of tRNASer, the anticodon nucleotides are different in the six isoacceptors and the acceptor stem, D-loop, and long variable arm unique to these isoacceptors are needed for recognition [227]. Anti-determinants are often modified bases; however, in the cases of Glu, Ile, and Lys modified bases in the anticodon loop are used as positive elements [221]. Anti-determinants of tRNAs from one class of aaRSs tend to be against binding by members of the other aaRS class [227]. Additionally, organisms that lack a particular aaRS will have tRNAs with positive and negative identity elements driven by this absence.

Other minor elements, located throughout the tRNA and in its core region, are more specific to each synthetase system and domain of life. Identity elements found in the core region of most tRNAs tend to be specific and contribute to architectural differences in the tRNA. Specificity elements found in the variable loop, TψC arm, and the D stem often contribute indirectly to binding by providing the necessary tertiary interactions for proper tRNA structure and folding. In the case of the initiator tRNAMet in yeast, changes in the elbow of the tRNA (A20 and A60) result in loss of methylation, while aaRS binding is retained, demonstrating how elements in this region can differentially affect tRNA structure and function [228].

Once folded into their L-shaped structure, tRNAs are basically comprised of two distinct domains, one being the acceptor helix stacked with the TψC arm and the other made up of the anticodon stem aligning with the D stem. These two domains interact with separate regions of the aaRS and are thought to have emerged independently from each other [229]. There are a few unique cases of mitochondrial tRNAs which lack the TψC and D arms and these tRNAs can only be charged by the corresponding mitochondrial aaRSs [230]. The aaRS active site domain, which defines the classification of a particular synthetase by its sequence and structure, interacts with the acceptor helix-TψC arm domain of the tRNA. Interactions made with the tRNA in this region vary depending on the synthetase class and unique features of different enzyme subgroups.

Synthetase interactions made with the anticodon-D-loop domain of the tRNA are carried out through additional enzyme regions that are separate from the “class-defining” catalytic core. These anticodon binding domains of aaRSs are much less conserved and can vary significantly within each class. As shown in the cases of GluRS, GlnRS, and AspRS [231, 232], binding to the anticodon results in large conformational changes in the tRNA, which then transmit changes to the active site. The two separate domains of tRNAs likely evolved separately as did the synthetase domains that recognize them. Modern aaRSs and tRNAs likely arose from ancestors with a simpler mode of tRNA–aaRS recognition solely involving the tRNA acceptor stem and aaRS class-defining catalytic domain [233]. The demonstration that minimalist tRNAs, or minihelices, are aaRS substrates supports this theory and such experiments provide insights into elements that were important for recognition prior to the emergence of larger contemporary tRNAs. Class II synthetases are thought to have appeared first in evolution as these enzymes are best able to aminoacylate minimalist tRNAs and, as mentioned above, some aaRSs of this class completely lack tRNA anticodon recognition elements [221].

4.1.2 Amino Acid Specificity

Amino acid recognition by synthetases takes place in the catalytic site prior to activation and formation of the aminoacyl-adenylate. The mode of amino acid binding varies between different classes of synthetases. Analysis of the CysRS crystal structure and those of other class I synthetases indicate that amino acid binding occurs when the conserved KMSKS motif is in an “open” confirmation. This binding occurs prior to ATP binding and adenylate formation, at which point the loop closes [29, 33, 48, 61, 63, 75, 231, 234]. Class II tRNA synthetases have evolved to discriminate among their amino acid substrates primarily by altering the amino acid side chains in the binding pocket as opposed to changing the position of protein backbone or secondary structure elements [144]. In addition, the size of the amino acid binding pocket may be important, as in the PheRS synthetic site where a conserved Ala residue helps determine specificity of phenylalanine over tyrosine [235]. Interestingly some PheRS variants, such as cytoplasmic PheRS in yeast and humans, contain a glycine at this position, resulting in significantly lower cognate amino acid specificity [217]. ThrRS contains a zinc ion in its active site that contributes to amino acid specificity by recognizing the hydroxyl at the β position of threonine and discriminating against alanine and serine [236]. Although modern synthetases have evolved differentiated structures for proper substrate recognition, the active site architectures of some of these enzymes are unable to distinguish between very similar amino acids with high enough stringency to ensure accurate translation. In these cases editing or proofreading mechanisms are found in many synthetases to aid in the elimination or hydrolysis of misactivated amino acids (Sect. 4.2.2).

4.2 Adapted and Changing Domains

AaRSs are thought to have evolved additional modules to help maintain accurate protein synthesis as the genetic code increased to include more amino acids and the number of isoacceptors increased. Such adapted domains include sites of post-transfer editing, and RNA recognition domains needed for tRNA anti-codon binding and structural stabilization. In addition to the core catalytic and various adapted domains, several aaRS modules evolved into free standing proteins either with synthetase-like functions, such as trans-editing domains, or with other roles in the cell (Sect. 5).

4.2.1 RNA Recognition

As mentioned above, domains outside the synthetase catalytic core can be involved in tRNA recognition. Such domains evolved much later and can vary significantly between synthetase enzymes. Many aaRSs contain additional tRNA recognition domains outside of the region of the catalytic site. For example class IIb aaRSs contain a conserved, lysine rich N-terminal anticodon binding domain (ABD). Structural and biochemical data for yeast AspRS illustrate how the N-terminus of this synthetase participates in tRNA binding, as the presence of this extension considerably increases the stability of the complex between AspRS and its homologous tRNA [42]. Aside from providing stability to the aaRS–tRNA complex for aminoacylation directly, these additional RNA binding domains can be used to provide tRNA stability and even sometimes to facilitate transport. For example, cytoplasmic LysRS in humans is selectively packaged along with the tRNALys isoacceptors to help transport the tRNALys replication primer into the HIV-1 viron [237]. The viral Gag polyprotein is required for this packaging event. Human LysRS also binds a portion of the HIV genome that contains a tRNALys anticodon-like element possibly to release LysRS from tRNALys, enabling this RNA to anneal to viral RNA for priming [238]. A second example of a trafficking role involves human TyrRS, where the nuclear localization signal is located in the same region of the protein needed for tRNA binding, thereby regulating TyrRS localization to the nucleus [239]. More generally, nuclear pools of synthetases in eukaryotes are predicted to serve as “proofreaders” for properly processed, functional tRNAs before these tRNAs are exported into the cytoplasm for their use in translation [240].

4.2.2 aaRS Editing

In order to maintain faithful translation, particularly in the case of similar amino acids where only so much specificity can be achieved by substrate discrimination, aaRSs have adapted methods to proofread or “edit” misacylated or incorrectly paired amino acid/tRNA pairs selectively. Editing activities can be found in approximately half of the aaRSs and both structural and biochemical studies have helped advance our understanding of how editing processes work in different aaRSs. The catalysis of aminoacylation by synthetases is a highly conserved mechanism; however, the editing mechanisms performed by these enzymes is much more variable. The high degree of diversity in proofreading further exemplifies the long evolutionary pathway of these enzymes as well as the role convergent evolution has played in their emergence. Both pre- and post-transfer editing mechanisms by aaRSs exist and are defined by the substrate. Pre-transfer editing targets the misactivated aminoacyl-adenylate and occurs within the active site of the aaRS itself. Post-transfer editing involves clearance of misacylated tRNAs and occurs in appended enzymatic domains that emerged later in evolution [241].

The presence of separate catalytic and editing sites in one enzyme, as predicted based on biochemical evidence [242], was first supported by structural studies of IleRS [29, 49]. Since then, editing by dedicated post-transfer editing CP1 domains in class I IleRS, ValRS, and LeuRS have been well characterized, in addition to many other editing systems. IleRS, ValRS, and LeuRS have a high degree of conservation in their CP1 domains that suggests early emergences and selective pressure to maintain editing in these enzymes. It has recently been shown that the rebinding and trans editing of a released misacylated tRNA is a possible post-transfer editing mechanism for these class I aaRSs; however, the relative importance of this pathway is not known [243]. A trans editing model has also been shown for the class II PheRS where the post-transfer hydrolysis of a misacylated tRNA occurs after rebinding and is thought to be a significant editing pathway [244].

Kinetic studies show LeuRS and ValRS mainly rely on post-transfer editing to prevent misincorporation of non-cognate amino acids [245, 246]. IleRS, however, also uses a distinct tRNA-dependent pre-transfer editing activity in its synthetic site [49, 247]. In the case of some LeuRS enzymes, there is less robust post-transfer editing activity against particular amino acids. Yeast cytoplasmic LeuRS is able to clear misacylated Ile-tRNALeu efficiently; however, the enzyme’s post-transfer hydrolytic activity against Met-tRNALeu is much weaker [248]. It was hypothesized that yeast cytoplasmic LeuRS can shift between pre- and post-editing pathways depending on the identity of the non-cognate amino acid. Human cytoplasmic LeuRS also shows modular pathways for editing different non-cognate amino acids. Norvaline, for example, is predominantly cleared by post-transfer editing while α-amino butyrate is the target of the pre-transfer mechanism [249]. Interestingly, when the yeast mitochondrial CP1 domain from LeuRS was isolated from the full-length enzyme it was unable to hydrolyze misacylated Ile-tRNALeu, which is in contrast to the isolated E. coli LeuRS CP1 domain [250]. This isolated yeast CP1 domain still retained its intron splicing activity (Sect. 5.1), suggesting that this LeuRS has functionally diverged to have a robust splicing activity, which has come at some expense to aaRS functionality in aminoacylation and editing [250]. The only other class I synthetase with editing activity is the related MetRS. Homocysteine, an intermediate of methionine biosynthesis, is activated by MetRS and subsequently edited prior to transfer to the tRNA. Unusually, this proofreading occurs within the active site of the enzyme and involves cyclization of the adenylate to form homocysteine thiolactone and AMP [251253].

Separate, adapted post-transfer editing domains are found in class II PheRS, ThrRS, AlaRS, and ProRS and pre-transfer editing activity has been demonstrated in the active sites of ProRS, SerRS, and LysRS II [241]. Class II aaRS post-transfer editing domains are much less conserved than those in the class I enzymes. This variability coincides with the trend of class II synthetases, which tend to share less conservation between different aaRSs. Some of the class II synthetases, where product release is rapid and not rate limiting, also have homologous free standing trans acting editing domains, a phenomenon that to date has not been described for class I enzymes. PheRS is among the least well conserved class II aaRSs, and exists in various forms in different domains and cellular compartments [217]. PheRS post-transfer editing takes place in the β subunit of the enzyme 40 Å away from the site of aminoacylation and is responsible for clearing misacylated Tyr-tRNAPhe [235, 254, 255]. Structure based alignments of the PheRS editing domain show considerable divergence as many archaeal/eukaryal PheRSs lack conservation of the critical residues found in bacterial PheRS [256]. Mitochondrial PheRSs exist as a monomer from which the β subunit and post-transfer editing are completely absent [46].

Post-transfer editing in ThrRS is necessary to hydrolyze mischarged Ser-tRNAThr and takes place in an adapted N2 domain of the N terminus, which shares homology to the same region of AlaRS. Mitochondrial ThrRS lacks the N2 domain and archaeal ThrRSs often contain an unrelated N-terminal domain, and in some cases the editing domain acts in trans [257, 258]. In-depth structural analyses of bacterial ThrRS have elucidated the post-transfer editing mechanism and show two water molecules to be involved in the hydrolysis reaction, one of which is excluded when Thr is in the editing site vs Ser [259, 260]. This editing mechanism is based on more than how well the misacylated substrate “fits” into the editing active site and is thought possibly to be similar in other post-transfer editing sites such as those of PheRS and LeuRS [260]. Also, the freestanding protein ThrXp in Archaea is homologous to the editing domain of ThrRS and is able to clear Ser-tRNAThr in vitro [258]. The editing domains of some ThrRS enzymes from archaea also share sequence and structural homologies with D-Tyr-tRNATyr deacylases (DTD) [261263]. DTDs contain trans editing activity against mischarged D-Tyr-tRNATyr, which can be synthesized by TyrRS, and are found across the three domains of life [264, 265]. This activity is essential to cell viability, as d-amino acids could dramatically alter protein folding and function. Interestingly, changing one particular residue in E. coli DTD to that found in ThrRS changed the specificity from d-amino acids to l-Ser, supporting the evolutionary linkage between DTDs and ThrRS [260].

Class II AlaRS has a flexible Ala binding pocket and as a result the enzyme has to be able to clear misactivated Gly and Ser [135]. An appended post-transfer editing domain is used to clear both Ser-tRNAAla and Gly-tRNAAla, while a trans editing domain, AlaXp is also used to clear the large non-cognate residue, Ser. The appended post-transfer editing domain of AlaRS is thought to have evolved from a primordial AlaXp that later fused to the aminoacylation domain of AlaRS [135]. Interestingly, all three domains of life contain the additional free standing editing domain, AlaXp, which is mainly responsible for clearing mischarged Ser-tRNAAla. There is strong evolutionary pressure to retain AlaXp in addition to AlaRS editing, as demonstrated in mice where reduced Ser-tRNAAla editing was linked to protein misfolding in neuronal cells [266].

For ProRS there exist several different mechanisms of synthetase proofreading, many of which include trans editing domains collectively known as ProX enzymes. The insertion domain (INS) is one which exists as an appendage to the core synthetase for most bacteria and is responsible for clearing mischarged Ala-tRNAPro. Some species, including Clostridium sticklandii, lack an INS domain and encode a freestanding domain PrdX that is used to hydrolyze Ala-tRNAPro [267]. The synthetic site of ProRS is also capable of mischarging Cys-tRNAPro, which can be cleared by a freestanding editing domain YbaK that is itself homologous to the INS domain [268270]. Human encoded ProX has recently been shown to deacylate mischarged Ala-tRNAPro, but not Cys-tRNAPro, by specifically recognizing the Ala moiety of Ala-tRNAPro [271]. Additional freestanding ProRS trans editing domain homologs, such as YeaK and PA2301, have also been identified based on sequence similarity to INS and YbaK; however, their function is still not clear [272].

Post-transfer editing domains are not universally conserved, and in several aaRSs these domains have actually been lost. Examples of editing domains that appear to have been lost during evolution include a number of mitochondrial synthetases, such as human mitochondrial ProRS, human mitochondrial LeuRs, human and yeast mitochondrial PheRS, as well as the cytosolic ProRS in higher eukaryotes, and most archaeal and mitochondrial versions of ThrRS. [46, 257259, 273276]. Many archaeal ThrRS enzymes have an N-terminal domain that is unrelated to the conserved N2 editing domain and in some cases a trans editing domain is encoded separately from the aminoacylation site [257]. Mycoplasma PheRS and LeuRS are also unable to post-transfer edit effectively as PheRS lacks conserved residues in the β subunit required for editing and LeuRS is missing the CP1 domain altogether [218, 219]. As a result of the error-prone activities of these two synthetases, the Mycoplasma mobile proteome naturally contains elevated levels of mistranslation. The evolutionary advantage of these error-prone proteomes is unclear, but it has been proposed that, because many of these organisms are obligate intracellular pathogens, misincorporation of similar amino acids increases antigen diversity without completely losing structural and function integrity of proteins [218]. Understanding the role of translational fidelity is important as editing by synthetases can vary greatly between different domains of life and even within a particular organism [217]. Whether or not reduced quality control within a particular organism or cellular compartment is beneficial and what environmental conditions or stresses dictate such benefits or disadvantages, is important to understanding what drives the evolution of synthetase fidelity.

The numerous examples of stable robust freestanding editing domains and their homology to fused synthetase domains strongly suggests that the free-standing variants may have first existed as independent proteins that were later fused with their respective aaRS. This theory is supported by the observation that the editing domains of several synthetases contain a conserved CXXC motif, which is often found in mobile elements that have been incorporated into larger proteins [267]. Interestingly, the trans editing domain YbaK functions most effectively as a stable complex with ProRS, outcompeting EF-Tu, suggesting one possible evolutionary path for the transition from freestanding to fused editing domains [269]. Domain acquisition and movement of editing domains are thought to have occurred more than once during the evolution of some aaRSs [267, 277]. Editing modules in ProRS, for example, have diverse contexts and can include insertions, N-terminal additions, and independent protein forms [267, 278, 279]. The editing domains of AlaRS and ThrRS are similar in sequence yet are located in different regions of the synthetase and could possibly have been acquired at different points during evolution. The lack of structural conservation within the ThrRS editing domains suggests that they were possibly acquired from more than one ancestor where they had emerged early in evolution, or the archaeal ThrRS editing site may have evolved rapidly after the divide of the eukaryote and archaeal lineages [257, 280].

5 Emergence of Non-canonical Functions in Aminoacyl-tRNA Synthetases

5.1 Fused Domains Having Non-canonical Functions

In addition to adapted aaRS domains involved in tRNA aminoacylation, several examples of fused aaRS domains used for non-canonical functions have emerged throughout evolution. There are many N- or C-terminal domains found only in eukaryotic synthetases that are not needed for aminoacylation activity. For example, in archaea and higher eukaryotes fused terminal domains are thought to play a role in forming the multisynthetase complex (MSC). The MSC is a complex of several synthetases and auxiliary proteins including eukaryotic initiation factor 1-α (eIF1-α) that is hypothesized to have a role in promoting synthetase activity and channeling translation components to the ribosome [281, 282]. Other fused aaRS domains, particularly prevalent in eukaryotes, are often involved in cell signaling pathways [283]. One example of a fused aaRS domain used for signaling is the WHEP domain found in chordate TrpRS, which regulates the angiostatic signaling activity of this synthetase [284, 285]. TyrRS in higher eukaryotes contains fused ELR and EMAPII domains that are both used for angiogenesis related signaling. The EMAPII domain blocks the signaling function of the ELR domain, and upon cleavage of the ELR domain post secretion the EMAPII is accessible for signaling. The remaining N-terminal fragment containing the aminoacylation domain (mini-TyrRS) is also able to promote leukocyte migration [286]. Another recent example of how synthetases play a role in signaling involves LeuRS and its key role in linking cellular amino acid levels to the TORC1 response pathway. In both human and yeast cells, leucine-bound LeuRS was found to interact with system specific GTPases through the conserved CP1 editing domain. This interaction in turn promotes lysosomal recruitment and activation of TORC1, which is responsible for regulating protein synthesis, ribosome biogenesis, nutrient uptake, and autophagy [287, 288]. Outside of cell signaling, such domains fused to aaRSs can be involved in the regulation of gene expression as in the cases of AlaRS DNA binding and transcriptional regulation of its own gene, PheRS regulation through transcriptional attenuation, and ThrRS for regulation of translation [289291]. In E. coli, ThrRS is able to bind the leader region of thrS mRNA and prevent binding of the 30S ribosomal subunit [292]. Hairpin recognition of the mRNA is similar to anticodon stem loop recognition by ThrRS. Gene regulation by fused domains of aaRSs also occurs in eukaryotes. For example, the synthesis of ribosomal RNA in humans appears to be regulated by the C-terminus of MetRS, which is needed for nucleolar localization and possible nucleic acid interactions [286].

AaRSs have also been found to have functions in tRNA and mRNA transport and processing. In yeast, the CP1 domain of mitochondrial LeuRS is involved in Group I intron splicing and the C-termini of mitochondrial TyrRSs from S . cerevisiae and Neurospora crassa have been implicated in rRNA splicing [250, 286, 293, 294]. Additionally, TyrRS in yeast appears to be important for tRNA export from the nucleus [240] and LysRS and AlaRS in certain eukaryotes are needed for mitochondrial import of tRNA [286]. Lastly, in most actinomycetes LysRS is fused directly to a multiple peptide resistance virulence factor, which uses specific aminoacylated tRNAs as substrates to aminoacylate and alter the properties and composition of membrane lipids [295297].

5.2 Paralogs of Synthetases

As more sequence data is becoming available for organisms from all three domains of life it is becoming evident that paralogs to aaRSs are encoded in most genomes, and genetic and biochemical studies have just begun to unravel the function of some of the corresponding gene products. Interestingly there are numerous examples of these paralogs that do not aminoacylate tRNA but are rather used for cellular activities outside of protein synthesis. These examples of free standing enzymes with new functionalities take advantage of many structural and sequence characteristics of aaRSs and further demonstrate the evolutionary importance of the enzymes.

5.2.1 Class I Paralogs

There are a number of class I aaRS paralogs that have been found to function as isoenzymes, or duplications of an enzyme with a different amino acid sequence that still performs the same chemical reaction. For example, two different forms of a synthetase used in the cytoplasm and mitochondria are considered isoenzymes. Synthetase isoenzymes that have evolved new functions have the same amino acid and tRNA specificity, but commonly have a slightly different active site relative to the canonical aaRS. These differences in specificity can be exploited by host organisms to provide resistance to natural inhibitors, as perhaps best exemplified by the IleRS isoenzyme that confers mupirocin resistance to a number of drug-resistant isolates of bacterial pathogens such as Staphylococcus aureus [298].

A number of aaRS paralogs have been found to function as peptide synthetases, transferring activated amino acids to carrier proteins in non-ribosomal peptide synthesis. AlbC is an example of a class I paralog with high similarity to an aaRS catalytic domain, in this case TrpRS, that functions as a cyclodipeptide synthase, transferring Phe from Phe-tRNAPhe to an activated serine residue [299]. MshC is a CysRS paralog that catalyzes Cys attachment to the amino group of a mycothiol precursor (not the hydroxyl like aaRSs) [300]. CPDSs are often derived from the class I catalytic domain, demonstrating the high amount of divergence that has occurred in these enzymes. There has also been a series of truncated class II SerRS homologs identified that activate and transfer amino acids to carrier proteins (Sect. 5.2.2) [301].

GluX (yadB in E. coli) is a truncated form of class I GluRS that lacks the entire C-terminal anticodon-binding domain. Early studies showed yadB was not essential in E. coli and it was thought to be a pseudogene without a known function. [37, 302]. Subsequently, yadB was shown to have a conserved prokaryotic function in posttranscriptionally modifying tRNAAsp on the modified nucleoside queuosine, which is inserted at the wobble position of the anticodon-loop and has been renamed glutamyl-Q-tRNAAsp synthetase [303]. This GluRS catalytic paralog has unusual tRNA binding that goes against the classic idea of the aaRS catalytic site–tRNA acceptor stem interaction. Structural mimicry between the anticodon-stem and loop of tRNAAsp and the amino acid acceptor-stem of tRNAGlu partly explains this unusual and unexpected mode of RNA binding.

5.2.2 Class II Paralogs

Paralogs of class II synthetases perform a wide variety of cellular roles outside of tRNA aminoacylation. For example, PoxA is a class II LysRS paralog found in many bacteria that posttranslationally adds β-lysine to a conserved lysine residue on translation elongation factor P [304]. Other notable examples include two paralogs of HisRS, HisZ, and GCN2, that have very different functionalities. HisZ, which was first identified in Lactococcus lactis, is homologous to class II HisRS proteins and is required for the first step in histidine biosynthesis [305]. As an essential subunit of HisG, an ATP phosphoribosyltransferase, HisZ catalyzes the transfer of ATP to 5-phosphoribosyl 1-pyrophosphate (PRPP) producing the substrate for nine additional steps in the histidine biosynthesis pathway. This enzyme provides another evolutionary link between amino acid biosynthesis and the aminoacylation reaction. HisZ also has some non-specific RNA binding activity, but whether this is functionally significant is not known [305]. GCN2 is found in eukaryotes and is used to sense amino acid levels and subsequently regulate translation. GCN2 enzymes contain a Ser-Thr kinase domain and a HisRS-like domain that binds uncharged tRNA and prevents kinase activity in the absence of tRNA binding [306]. HisZ and GNC2 are functionally distinct and products of separate evolutionary events. HisZ is the result of an early gene duplication event in bacteria while GCN2 is the result of a later gene duplication event in eukaryotes [305]. Asparagine synthetase A (AsnA) shows homology to the catalytic domain of Asp/AsnRS and is responsible for catalyzing asparagine synthesis using aspartate, ATP, and ammonia as substrates [119, 307]. Structural and phylogenetic data suggest AsnA evolved from a duplication of the ancestral AspRS gene, leading to the archaeal/eukaryal AspRS and a precursor to AsnRS and AsnA. Upon duplication of this AsnRS precursor gene, one copy evolved Asn activation and tRNAAsn binding activity, while the other copy lost its anticodon-binding domain and evolved a new catalytic site to become AsnA [119]. The biotin protein ligase, BirA in E. coli, is a paralog of SerRS [308] at its catalytic site and functions to activate biotin to form biotinyl-5′-adenylate and then catalyze the covalent attachment of this biotin to a subunit of acetyl-CoA carboxylase at a lysine residue [309]. As more structural data became available BirA was also shown to resemble the class II PheRS β subunit [310]. Similarities between the structures of BirA and PheRS were found in a region separate from the catalytic domains of these proteins that resemble Src-homology 3 (SH3)-like DNA binding domains. This region of BirA is responsible for regulating transcription of the biotin operator [311]. Both AsnA and BirA catalyze reactions that involve the formation of an adenylated intermediate, which is not the case for the HisZ enzyme. It was suggested that this absence of adenylation by HisZ and its role in binding and regulating histidine indicate early aaRSs may have been simple amino acid binding proteins [305].

Other SerRS paralogs include SLIMP and homologs that acylate carrier proteins for non-ribosomal protein synthesis. SLIMP is found to be localized in the mitochondria of insects and is needed for proper development in flies [312]. The function of this protein is unknown; however, it shows a general affinity to RNA and may be involved in mRNA processing and/or gene expression. There has also been a series of truncated SerRS homologs identified in a number of bacteria, which are similar in structure to the catalytic region of an atypical SerRS (aSerRS) that is found in methanogenic archaea [313]. These homologs lack tRNA binding and canonical aminoacylation activity, but rather activate and transfer amino acids to a phosphopantetheine prosthetic group on carrier proteins. The functions of these carrier proteins have yet to be identified, but it is thought that they possibly play a role in non-ribosomal protein synthesis [301].

One last example of a class II aaRS paralog with a cellular function outside of tRNA aminoacylation is found in mitochondrial DNA polymerase. The Polγβ subunit, which is responsible for the enzyme’s processivity, has a domain that is similar to the catalytic domain of class IIa synthetases. The regions of similarity include the aaRS active site that binds the amino acid, ATP, and the acceptor stem of the tRNA [314]. Polγβ also has a C-terminal domain that is similar to the tRNA anticodon binding domain of the dimeric GlyRS [315]. Despite these similarities, Polγβ has important differences and lacks critical residues necessary for tRNA anticodon binding and therefore does not retain the function of an aaRS. This example does show strong evolutionary links between this polymerase subunit and aaRSs, particularly in their nucleic acid binding properties.

6 Conclusion

Almost 60 years after evidence of aminoacyl synthetase activity first emerged [316, 317], the field is continuing to grow and provide insights into evolution, the fidelity of protein synthesis, and the workings of other biological systems. The discoveries of PylRS and SepRS demonstrate how the genetic code is less rigid than once thought and has been adaptive to changes in environmental demands [183, 201]. The vast increase in aaRS structural information within the last several years has increased our resources for phylogenetic analysis as well as helped explain biochemical mechanisms. Also, the structure of PylRS provides a great example of substrate orthogonality and is paving the way for advances in protein engineering [146]. Lastly, the immense amounts of recent genomic sequencing data have uncovered numerous aaRS accessory domains and paralogs whose functions have connected aaRSs to cellular development and disease and are targets of new therapeutic development [318, 319].