Coding of Class I and II Aminoacyl-tRNA Synthetases

Carter, Charles W.

doi:10.1007/5584_2017_93

Charles W. Carter Jr⁷

Part of the book series: Advances in Experimental Medicine and Biology ((PROTRE,volume 966))

2019 Accesses
34 Citations
6 Altmetric

Abstract

The aminoacyl-tRNA synthetases and their cognate transfer RNAs translate the universal genetic code. The twenty canonical amino acids are sufficiently diverse to create a selective advantage for dividing amino acid activation between two distinct, apparently unrelated superfamilies of synthetases, Class I amino acids being generally larger and less polar, Class II amino acids smaller and more polar. Biochemical, bioinformatic, and protein engineering experiments support the hypothesis that the two Classes descended from opposite strands of the same ancestral gene. Parallel experimental deconstructions of Class I and II synthetases reveal parallel losses in catalytic proficiency at two novel modular levels—protozymes and Urzymes—associated with the evolution of catalytic activity. Bi-directional coding supports an important unification of the proteome; affords a genetic relatedness metric—middle base-pairing frequencies in sense/antisense alignments—that probes more deeply into the evolutionary history of translation than do single multiple sequence alignments; and has facilitated the analysis of hitherto unknown coding relationships in tRNA sequences. Reconstruction of native synthetases by modular thermodynamic cycles facilitated by domain engineering emphasizes the subtlety associated with achieving high specificity, shedding new light on allosteric relationships in contemporary synthetases. Synthetase Urzyme structural biology suggests that they are catalytically-active molten globules, broadening the potential manifold of polypeptide catalysts accessible to primitive genetic coding and motivating revisions of the origins of catalysis. Finally, bi-directional genetic coding of some of the oldest genes in the proteome places major limitations on the likelihood that any RNA World preceded the origins of coded proteins.

Access provided by CONRICYT-eBooks. Download chapter PDF

Emergence and Evolution

Did Gene Expression Co-evolve with Gene Replication?

Engineered triply orthogonal pyrrolysyl–tRNA synthetase/tRNA pairs enable the genetic encoding of three distinct non-canonical amino acids

Article 29 May 2020

Keywords

1 Introduction

It is unlikely that the aminoacyl-tRNA synthetases played any specific role in the evolution of the genetic code; their evolutions did not shape the codon assignments. (Woese et al. 2000)

A real understanding of the code origin and evolution is likely to be attainable only in conjunction with a credible scenario for the evolution of the coding principle itself and the translation system. (Koonin and Novozhilov 2009)

The first epigram begins the concluding section of an authoritative review of this field published in 2000. The review contains much information and detailed interpretations based on the best available data at that time. Much of the research and theory to emerge since that time, however, has pointed to the opposite conclusion, in keeping with the dialectic component always implicit in scientific research. Consistent with the spirit expressed in the second epigram, unprecedented experimental and bioinformatic studies of the earliest evolution of aminoacyl-tRNA synthetases (aaRS) now make a compelling case for their intimate and probably necessary participation, with tRNA, in the evolution of the universal genetic code and the shaping of codon assignments.

1.1 The RNA World Hypothesis

The results reviewed herein are especially relevant to the question of whether or not present day biology replaced a prior organization in which information storage and catalysis both were entirely the province of RNA (Gilbert 1986; Robertson and Joyce 2012), an idea with resilient support in the literature (Robertson and Joyce 2012; Bernhardt 2012; Breaker 2012; Yarus 2011a, b; Van Noorden 2009; Wolf and Koonin 2007). As argued elsewhere in detail (Carter and Wills 2017; Wills and Carter 2017), the actual evidence for such scenarios is remarkably thin. The catalytic repertoire of RNA is very limited, relative to that of proteins (Wills 2016) and appears to be incapable of the sophistication required to synchronize metabolism with genetics. Evidence from ever more impressive aptamer replicases (Horning and Joyce 2016; Sczepanski and Joyce 2014; Taylor et al. 2015; Attwater et al. 2013; Wochner et al. 2011) and the practical value of Selex experiments (Tuerck and Gold 1990), would support only a highly limited version of the RNA World hypothesis unless phylogenetic relationships connected them to biological ancestry. So far as we know, all nucleic acids in contemporary biology are synthesized by protein enzymes, much as, reciprocally, the synthesis of proteins from activated amino acids is catalyzed by an RNA template at the peptidyl transferase center of the ribosome (Noller 2004; Noller et al. 1992; Bowman et al. 2015; Petrov et al. 2014). Thus no phylogenetic basis exists for ancestral ribozymal polymerases.

Moreover, the proposal that aminoacyl-tRNA synthetase enzymes arose as a single pair of ancestors coded bi-directionally on opposite strands of the same RNA gene decisively undermines the heart of the RNA World scenario, by establishing that catalysis of aminoacylation by proteins emerged with scarcely non-random fidelity. Such lack of specificity would have been abolished by purifying selection, had there been any ribozymal system with higher fidelity.

Moreover, aminoacyl-tRNA synthetases, aaRS, represent a unique group of enzymes because, as the only genes in the proteome that, when translated by the rules of genetic coding, can then impose those rules, they compose a unique, reflexive interface between genes and gene products. This special relationship to the proteome lends considerable significance to the evolutionary phylogenetics of aaRS gene sequences (Chandrasekaran et al. 2013; Cammer and Carter 2010), i.e. to how the aaRS came to be encoded. I will argue that by studying the ancestral coding of contemporary aaRS, we are led directly to a deeper understanding of how the genetic code might have arisen far more rapidly as a collaboration between ancestral proteins and RNAs than would ever have been possible in a world based entirely on a single polymer type (Carter and Wills 2017; Wills and Carter 2017).

There is consensus on several important aspects of aaRS structural and sequence-derived phylogenetics. Notably, they form two utterly distinct superfamilies that, on several levels are as distinct as possible from each other. Class I aaRS active sites all assume a Rossmann dinucleotide binding fold first observed in lactate dehydrogenase and flavodoxin (Buehner et al. 1973) in which the active site forms at the interface between parallel β-strands and the amino termini of two helixes. In contrast, Class II aaRS active sites are formed from antiparallel β-strands.

These structural differences (Eriani et al. 1990a) motivated substantial effort to understand why and how nature would have divided the labor of tRNA aminoacylation in such a binary fashion (Delarue 2007; Delarue and Moras 1992; Ibba et al. 2005; Cusack 1995; Härtlein and Cusack 1995; Cusack 1993, 1994; Ribas de Pouplana and Schimmel 2001a, b, c; Schimmel and Ribas de Pouplana 2000; Schimmel 1991, 1996; Schimmel et al. 1993; ). Answers to these questions have emerged from supplementing phylogenetic analysis (O’Donoghue and Luthey-Schulten 2003) with experimental deconstruction by protein engineering (Martinez et al. 2015; Carter 2014; Li et al. 2011, 2013; Pham et al. 2007, 2010) and recapitulation (Weinreb et al. 2014; Li and Carter 2013) complemented by novel phylogenetic metrics (Chandrasekaran et al. 2013; Cammer and Carter 2010).

Conclusions emanating from these studies change how we view the proteome and origin of genetic coding in important ways:

(i)
Class I and II aaRS appear to have originated from complementary coding sequences on opposite strands of the same bi-directional ancestral gene (Martinez et al. 2015; Carter et al. 2014).
(ii)
That gene complementarity has profound implications for the origin of genetic information, some of which had already been suggested by others (Rodin et al. 2009, 2011; Rodin and Rodin 2006a, b, Rodin and Rodin 2008;).
(iii)
The inversion symmetry of complementary coding strands has recognizable consequences for protein secondary and tertiary structures, and the active site construction of the resulting Class I and II enzymes (Carter and Wills 2017; Carter et al. 2014), especially in light of the organization of the genetic code.
(iv)
Complementary studies of the modular aaRS architectures of both synthetase classes (Schimmel et al. 1993; Carter 2014) led to the discovery of how the organization of tRNA coding elements record how amino acids behave in water and in protein folding (Carter and Wolfenden 2015, 2016).
(v)
Whereas we can only begin to speculate on how such a gene emerged, it seems clear that it arose from a peptide RNA partnership and not from an RNA World (Carter and Wills 2017; Carter 2015).

Thus, it now becomes possible to propose, in outline, a much more targeted program for studying how translation evolved.

1.2 The Hypothesis of Rodin and Ohno (1995)

Shortly after the aaRS Class division became apparent (Eriani et al. 1990a, b; Cusack 1993; Cusack et al. 1990, 1991) Rodin and Ohno published a remarkable hypothesis (Rodin and Ohno 1995). They used multi-family sequence alignments to establish consensus codons for the Class-defining motifs in the two superfamilies and found that codons for Class I PxxxxHIGH and KMSKS active-site catalytic motifs were almost exactly anticodons for Class II Motifs 2 and 1, respectively. That statistically significant, in-frame complementarity, illustrated in Fig. 1a, suggested that the contemporary aaRS superfamilies had at one time been coded by a single ancestral gene, one strand of which was transcribed and translated giving the ancestral Class I synthetase. Conversely, the opposite strand encoded the ancestral Class II synthetase.

The authors actually understated the statistical support for their case by not citing probabilities—10⁻⁸–10⁻¹⁸—of the observed alignments under the null hypothesis indicated by their jumble-testing z-scores. Perhaps for this reason, the hypothesis remained more or less dormant for almost a decade before it was revived by Carter and Duax (Duax et al. 2005; Carter and Duax 2002). Subsequent work has substantially strengthened the experimental and bioinformatic support for the hypothesis by articulating and testing predictions that it makes (Fig. 1b). Direct support for the hypothesis, discussed at length in this review, is made more relevant by related work on the origin of translation (Carter and Wills 2017; Wills 1993, 2004, 2016; Petrov and Williams 2015; Caetano-Anollés et al. 2013; Harish and Caetano-Anollés 2012) and the genetic code (Carter and Wolfenden 2015, 2016; Wolfenden et al. 2015).

1.3 The Origins of Symbolic Interpretation and Coding

Biological nucleic acid sequences represent an exquisite repository of information relevant to managing stimuli from the world at large. For our purposes, it is useful to distinguish between two types of information stored in nucleic acids. Information about the chemical environment of biology defines how amino acids behave in water; gene sequences exploit that information by configuring amino acid sequences capable of folding into functional proteins.

The information in genes furnishes the blueprints for assembling proteins via the ribosomal read-write apparatus. Genes constitute a set of programs, written in the language of the genetic code, and expressed as a sequence of codons, or symbols consisting of three consecutive nucleotide bases, each with a specific meaning—start, a particular amino acid belongs here, stop. Equally important is the translation table embedded in transfer RNA. This second type of information specifies the conversion of the symbolic information in codons into specific amino acids. It has recently become apparent that this conversion corresponds closely to the phase transfer equilibria that enable translated gene sequences to fold and function. It represents the programming language in which genes are written and self-organization has embedded it efficiently and robustly into tRNA base sequences, primarily in the acceptor stem and anticodon.

The aaRS connect codons, hence messages in mRNA to amino acids via the translation table in the genetic code, each synthetase performing specific tRNA aminoacylation to enforce the rule specifying that a particular codon means that a particular amino acid is to be inserted whenever the anticodon of its cognate tRNA matches a codon in the message. As the synthetases are themselves made according to specific mRNA sequences, their connection to the genetic code is deeply reflexive or self-referential: once translated, they themselves become sensitive to the impact that water has on their constituent amino acids and fold into active conformations. Those folded conformations subsequently execute the symbolic rules in the genetic coding table to make themselves and all other proteins (Fig. 2; (Carter and Wolfenden 2016)).

At the nucleic acid level, the molecular nature of this information, and how it is preserved from one generation to the next, are well understood in terms of base-pairing, as are the general structural mechanisms by which this base sequence information is read out by transcription, and converted first to protein sequences by the ribosome during translation and then to folded, active enzymes. Principles of protein folding are also beginning to be understood, at least in outline (Fersht 2000; Dill and MacCallum 2012; Baker 2000). Key to understanding the origin and evolution of each of these mechanisms is the element of “interpretation”—rephrasing the encoded information into a more flexible form capable of a greatly extended range of functionalities.

How the genetic code became embedded in tRNA sequences and how mRNA sequences originated, however, have been essentially blank pages. High probability synthetase•tRNA complexes were essential to launching translation. The probability of implementing molecular recognition and interpretation via self-organization (Wills 1993; Johnson and Lam 2010; Füchslin and McCaskill 2001; Küippers 1979; Eigen and Schuster 1977) and natural selection (Dennett 1995) decrease sharply the more sophisticated the system. Thus, it seems likely that translation began with smaller, less specific complexes, hence a simpler, less precise, and probably redundant alphabet. The recent experimental work reviewed here points clearly to molecular models with just those properties. These and other arguments (Carter and Wills 2017; Wills and Carter 2017). imply that genetic coding arose in a flexible, rudimentary implementation that later underwent successive refinements that completed the code (Carter and Wills 2017).

Given that the aaRS are probably the first and only gene products (ie., of the second type of information) with the ability to interpret the first type of information (ie., the genetic code translation table), it should come as no surprise that their molecular phylogenies contain potentially useful information concerning how they arose and began translating genetic messages. Retracing evolution is a unique subset of reconstructing the past, i.e. of history. It can be argued that the effort is fruitless as one cannot run the tape in reverse, that important and relevant witnesses are all extinct, and in particular that there is no way to test hypotheses. Our work (Chandrasekaran et al. 2013; Cammer and Carter 2010; Martinez et al. 2015; Carter 2014, 2015; Li et al. 2011, 2013; Pham et al. 2007, 2010; Weinreb et al. 2014; Li and Carter 2013; Carter et al. 2014, 2016; Rodin et al. 2009; Carter and Wolfenden 2015, 2016; Carter and Duax 2002; Wolfenden et al. 2015; Sapienza et al. 2016), and that of others (O’Donoghue and Luthey-Schulten 2003; Petrov and Williams 2015; Caetano-Anollés et al. 2013; Harish and Caetano-Anollés 2012; Caetano-Anollés 2015; Sun and Caetano-Anollés 2008), tends to rebut this objection. In reality, molecular phylogenies of the Class I and II aaRS have proven to be a rich source of unexpected insights into how translation became possible and robust tools now enable us to investigate, and even recapitulate key events from the past history of life.

2 Evidence for Bi-directional Coding Ancestry: Molecular Phylogenies, Urzymes, and Protozymes

The broader evidence now supporting the hypothesis of Rodin and Ohno (1995) began with analysis of superpositions of the three-dimensional structures of Class I and II aaRS, as illustrated in Fig. 1 and described in Sect. 2.1. Section 2.2 reviews the experimental characterization of the parallel catalytic activities and amino acid specificities of the deconstructed hierarchies from both classes. Problems raised by experimental recapitulation of putative evolutionary events connecting the ancestral forms to the contemporary enzymes are discussed in Sect. 2.3. A new distance metric for phylogenetic analysis of protein superfamilies related by bi-directional coding ancestry is reviewed in Sect. 2.4, together with its possible use in identifying how synthetases for the 20 canonical amino acids may have diversified from a single ancestral gene.

2.1 Protein Engineering and Experimental Deconstruction

The overall strategy of these studies has been to deconstruct Class I and II aaRS into genes for their component modules, use enzyme kinetics to characterize their catalytic activities and specificities, and validate their authenticity. Recapitulation of putative evolutionary intermediates by partial reconstruction also has been carried out, although to a lesser extent, as described in Sect. 2.3.

Deconstruction

Genes coding for intermediate modules were made using molecular biological techniques. For Class II aaRS in which the active site is formed by a continuous, uninterrupted coding sequence, deconstruction can be accomplished using PCR amplification of the desired region (Li et al. 2011). For Class I aaRS, however, the active site—and Urzyme—are discontinuous, requiring more aggressive protein engineering (Pham et al. 2007, 2010). Two aspects of the fusion and solubilization of the Class I Urzymes required amino acid sequence modification: (i) the intervening insertion element had to be removed and the two ends fused together, and (ii) an extensive surface area of nonpolar side chains, exposed by the removal of entire domains, needed to be modified to enhance solubility. In constructing Urzymes for TrpRS and LeuRS, both operations were accomplished using the Design module in the Rosetta program (Dantas et al. 2003).

Urzymes

Multiple sequence and especially multiple structure alignments furnish the basic tools for constructing molecular phylogenies (O’Donoghue and Luthey-Schulten 2003) (Fig. 3). Superimposing three-dimensional structures of proteins within the same superfamily reveals that certain modules are shared by all family members, whereas others differ distinctly from member to member. The most conserved modules generally contain the active sites, and for that reason alone are likely candidates for evolutionary intermediates. For both aaRS classes, modules shared by all ten superfamily members contain essentially their intact active sites built from ~130 residues (Pham et al. 2007; Carter and Duax 2002). These modules have been expressed independently of the rest of the contemporary gene from two Class I and one Class II aaRS and shown to exhibit ~60% of the transition state stabilization of the full-length enzymes (Li et al. 2011; Pham et al. 2007, 2010). Their extensive conservation and enzymatic activities earned them the descriptor “Urzyme” from the German prefix Ur = primitive, authentic, original plus enzyme.

Protozymes

Mildvan published a series of studies in which he excerpted the ATP binding sites of three different P-loop ATPases—F1 ATPase (Chuang et al. 1992a, b), DNA polymerase (Mullen et al. 1993), and adenylate kinase (Fry et al. 1985, 1988)—and demonstrated that they retained ligand-dependent structures similar to that observed in the full-length proteins. All three ATP binding sites consist of ~50 residue β-α-β secondary structures with a glycine-rich loop between the first strand and helix, and appear homologous in these respects to the Class I aaRS ATP binding sites. That precedent motivated further deconstruction of the Class I and II Synthetase Uryzmes, both of which contain ATP binding sites of approximately the length—46 residues—studied by Mildvan. Expression and fluorescence titration of ATP by these 46-mers established that they, too, bind ATP tightly, motivating investigation of their possible catalytic properties. ATP binding sites from both Class I and II aaRS accelerated amino acid activation by 10⁶-fold (Martinez et al. 2015), and led to their designation as “protozymes” from “proto” = first.

The hierarchy—monomer > catalytic domain > Urzyme > protozyme (Fig. 4)—illustrates the parallel evolution of both Class I and Class II aaRS providing details abstracted in Fig. 3. Of particular interest are the following:

(i)
Red and blue modules are interrupted by an insertion (connecting peptide 1 CP1 (Burbaum and Schimmel 1991)) in the Class I Uryzme but continuous in the Class II Urzyme.
(ii)
The protozyme module (blue) occurs at the amino terminus of the Class I and at the carboxy terminus of Class II aaRS Urzymes.
(iii)
Transition-state stabilization free energies for amino acid activation assayed by PPi exchange for each catalyst, ΔGk_cat/K_M/k_non = −RTln(k_cat/K_M/k_non), are approximately linearly related to its mass (Martinez et al. 2015). Note that transition-state stabilization free energies are logarithmically related to the rate enhancements.

Catalytic activities of Class I (Pham et al. 2007, 2010)and II (Li et al. 2011, 2013) Urzymes were the first observations to substantively validate predictions implied by Rodin and Ohno for the bi-directional genetic coding of Class I and II aaRS. The third observation establishes a crucial pre-requisite for the evolution of catalytic activity in general: insofar as catalysis is required to synchronize the rates of chemical reactions in the cell, it is essential that different enzyme families across the proteome evolve so as to preserve parallel increases in rate enhancement.

Characterization

Overexpressing Urzymes from both Classes leads to their accumulation in inclusion bodies. Washed inclusion bodies contain >50% Urzyme in such cases, and therefore represent a significant purification. Inclusion bodies solubilized in 6 M guanidinium hydrochloride can be renatured by size exclusion chromatography on superdex 75, which also yields essentially pure Urzyme. Active-site titration (Fersht et al. 1975; Francklyn et al. 2008) shows that between 35–70% of the molecules in various preparations contribute to the observed activity seen in pyrophosphate exchange assays (Francklyn et al. 2008). TrpRS and HisRS Urzymes accelerate the rates of amino acid activation (assayed by pyrophosphate exchange) and tRNA aminoacylation by 10⁹-fold and 10⁶-fold, respectively (Li et al. 2013). These values are consistent with measurements of the uncatalyzed rates for the two reactions, as spontaneous amino acid activation (Kirby and Younas 1970) is ~1000-fold slower than are either spontaneous acylation (Wolfenden and Liang 1989) or peptide bond formation from activated amino acids (Schroeder and Wolfenden 2007; Sievers et al. 2004).

As protozymes isolated from the two aaRS Classes have only ~40% of the mass of Urzymes, they are substantially weaker catalysts, activating cognate amino acids 10⁶ times faster than the uncatalyzed rate. PPi exchange assays were incubated for 14 days and assayed at intervals of several days (Martinez et al. 2015). The specificities of amino acid activation by wild-type and bi-directionally coded protozymes, and their possible acylation activities have yet to be determined.

Validation

Establishing the authenticity of the catalytic activities observed for the aaRS Urzymes and protozymes is obviously of great importance, and is a matter to which considerable attention has been paid (Carter 2014; Li et al. 2011, 2013; Pham et al. 2007, 2010; Carter et al. 2014). They are much weaker catalysts than full length enzymes, and consequently, their activities can much more readily be attributed to very small amounts of various kinds of contaminating enzymes, including, of course, the full-length native homologs present in all cell extracts. In addition to the absence of activity in conventional controls carried out using extracts prepared from cells containing empty cloning vectors, authenticity was established by four controls:

(i)
Steady-state kinetic experiments show that Urzyme and protozyme activities saturate at amino acid concentrations several orders of magnitude higher than is required to saturate the full-length enzymes. This argument is strengthened by the specificity spectra determined for the Class I LeuRS and Class II HisRS Urzymes (Fig. 5; (Carter et al. 2014; Carter 2015)).
(ii)
Cryptic catalytic activity is released when Urzymes expressed as fusion proteins with maltose-binding protein are treated with TEV protease.
(iii)
Active-site mutations and modular variants containing minor additional mass at the N- and C-termini alter the measured activity.
(iv)
Active-site titration confirms that a major fraction of molecules contribute to the observed activity (aaRS Urzymes only; it is unclear that the protozymes would exhibit a pre-steady state burst, which is a requisite for active-site titration).

All these results implicate the actual genetic construct in the observed activity, and contaminating activities cannot account for either (ii), (iii), or (iv).

2.2 Class I and II aaRS Deconstructions Exhibit Parallel Catalytic Hierarchies

Catalytic Rate Enhancements Correlate with Catalyst Mass

Deconstructions (Fig. 4) reveal surprisingly consistent increases in transition-state stabilization with additional masses in the ascending hierarchies (Carter et al. 2014; Carter 2015). Catalyst masses range by 70-fold from ~6.5 KD to 450 KD, and the constructs derived from each Class are distributed differently with respect to size. Class I deconstructions are a “low resolution” map because they include two modular hybrids—catalytic domain and Urzyme plus anticodon-binding domain—between the Urzyme and the full length monomers. The Class II constructs, on the other hands, include high resolution divisions—increments of 6, 20, and 26 residues—at approximately the 126-residue size of the Urzyme. Across this entire range of deconstructions, transition-state stabilization energies increase linearly with the number of residues (Martinez et al. 2015). Moreover, the slopes for each Class are the same within 5%.

Class I, II Constructs at Each Stage Have the Same Catalytic Proficiencies

A second remarkable result of the aaRS deconstruction is that the linear relationships between transition-state stabilization free energy and the number of residues also have the same intercepts. This implies strongly that throughout the evolutionary history of the two synthetase Classes, they retained comparable catalytic proficiencies (Martinez et al. 2015). The importance of this observation is that the synthetase superfamilies form a tightly interdependent autocatalytic set coupled by the fact that each is required to operate with approximately the same throughput of aminoacylated tRNAs for translation of all amino acids within the current genetic alphabet (Carter and Wills 2017). Their enzymatic activities must, therefore, have remained quite comparable for all relevant amino acids over the duration of the synthetase superfamily growth from short peptides to long polypeptides. It was not obvious, however, that experiments would confirm that expectation. Nevertheless, aaRSs from both Classes appear to have been capable of parallel increases in both size and catalytic proficiency, consistent with continuously providing comparable quantities of aminoacyl-tRNAs for all amino acids throughout the evolutionary tuning of the genetic code.

Class I and II Urzymes Are Promiscuous Catalysts That Nonetheless Have Comparable Amino Acid Specificity

Essentially complete amino acid specificity spectra have been determined for amino acid activation by the Class I LeuRS and Class II HisRS Urzymes (Fig. 5) (Carter et al. 2014; Carter 2015). Remarkably, the two catalysts retain a significant preference for the class of amino acid substrates for which their parent enzymes were specific. The LeuRS Urzyme prefers not to activate Class II amino acids; the HisRS2 Urzyme prefers not to activate Class I amino acids. The degree of specificity, evaluated as the free energy of the specificity ratio, ΔGk_cat/K_M(I/II) and ΔGk_cat/K_M(II/I), are ~ − 1 kcal/mole for both Urzymes. This value is roughly 20% of that for the full-length enzymes. It means that given equimolar concentrations of all 20 amino acids, the Class I Urzyme will activate an amino acid from the wrong class roughly one time in 5, whereas a native aaRS will typically active an incorrect amino acid roughly one time in 5000. Thus, aaRS Urzymes are promiscuous with respect to amino acid recognition, but retain the Class preferences of the full-length enzymes from which they were derived for amino acids within their own class.

As noted below, the problem of evolving high amino acid specificity is more subtle than might appear from the initial studies in Fig. 5. An important possibility is that specificities were enhanced in the presence of cognate tRNAs, as is true for several contemporary aaRS (Farrow et al. 1999; Ibba and Soll 2004; Uter et al. 2005; Sherlin and Perona 2003). Work is in progress to characterize tRNA specificity in a similar fashion, and to determine whether or not the amino acid specificity spectra (Fig. 5) improve in the presence of cognate tRNAs.

Wild-Type and Bi-directional Protozyme Gene Products from Both Classes Have the Same Catalytic Proficiencies

Section 2.4 discusses in greater detail the extent to which experimental and bioinformatics results have confirmed the hypothesis of Rodin and Ohno that the original ancestral genes for Class I and II aaRS were fully complementary. It is worthwhile noting here that wild type Class I and II protozymes were excerpted directly from full-length TrpRS and HisRS genes. Although those coding sequences retain a strong trace of their bi-directional coding ancestry, they are distinctly not complementary. To test the prediction that a fully complementary protozyme gene could be achieved, the computer design program, Rosetta, already used extensively in the re-design of Class I Urzymes, was adapted to impose coding complementarity on the two protozyme genes, resulting in a single gene with two different functional translation products, one from each strand (Martinez et al. 2015).

Analysis of the peptides coded by the resulting bi-directional gene showed that the four gene products—Class I and II; designed and wild type—have nearly the same catalytic proficiency, ΔGk_cat/K_M = +3.5 ± 0.8 kcal/mole. The amino acid sequences of the designed protozymes are quite different from the WT sequences. This agreement therefore suggests that the catalytic activities of the Class I and II protozymes may be consistent with a very large number of different sequences that share only simple patterns based on a reduced alphabet of fewer amino acids, consistent with their possible emergence at a time when the amino acid alphabet was both smaller and less faithfully implemented.

Designed Protozymes Have High Turnover, Low Specificity

The steady-state kinetic parameters for WT and designed protozymes revealed yet another remarkable comparison. Although the overall second-order rate constants, ΔGk_cat/K_M, are very nearly the same, their similar values arise from quite different values for the turnover number and amino acid substrate affinity. The WT protozymes, perhaps because they were excerpted from the full length proteins, retain higher ground-state substrate affinities but have lower turnover numbers, whereas the designed Protozymes have higher turnover numbers and weaker ground-state affinities (Martinez et al. 2015). The differences in both parameters are about 100-fold, leaving their k_cat/K_M ratios unchanged.

Without intending to do so, by enforcing genetic complementarity the design process also enhanced the turnover number while weakening amino acid affinity (Fig. 6). Specific binding of cognate, versus non-cognate amino acids cannot be improved without increasing the binding affinity of the cognate complex, so increased ground state amino acid affinities are a prerequisite for enhanced discrimination between competing substrates. Thus, deconstructions of both aaRS Classes exhibit parallel enhancements that improved fitness by increasing both catalysis and specificity. Notably, the higher turnover number and lower amino acid affinity of the designed, bi-directionally coded protozymes match properties expected for a emerging rudimentary coding apparatus.

2.3 Recapitulation

Modular Engineering of TrpRS

One of the best ways to validate and utilize knowledge gained from the reconstruction of ancestral forms is to recapitulate putative evolutionary steps by reconstructing and testing intermediates (Bridgham et al. 2009; Dean and Thornton 2007; Thornton 2004). The deconstruction of the Class I and II aaRS has afforded such opportunities (Li et al. 2011; Li and Carter 2013). Those investigations cast new light on synthetase function and evolution.

Two putative TrpRS constructs intermediate between the Urzyme and full-length enzyme involved the re-insertion of the CP1 fragment to restore the catalytic domain and the covalent joining of the anticodon-binding domain to the Urzyme (Li and Carter 2013). Comparison of these modular variants showed that, although both intermediate species exhibited modest increases in catalytic proficiency, neither was any better than the Urzyme, either in aminoacylation or in discriminating between cognate tryptophan and non-cognate tyrosine (Li and Carter 2013). There are two notable interpretations of this surprising result. First, as neither intermediate construct would have sufficiently increased fitness to be selected, it suggests that the apparently separate evolutionary enhancements must have occurred coordinately, either because one of the two had already begun to function in trans, or because one or the other, or both modules could “grow” by smaller modular additions that did endow enhanced fitness (Li and Carter 2013). Second, the modular thermodynamic cycle involving full-length TrpRS, the two distinct intermediates (Li and Carter 2013), and the Urzyme allowed measurement of a ΔΔG^‡ ~ −5 kcal/mole coupling energy between the CP1 and anticodon-binding domains in the transition state of the amino acid activation reaction by full-length TrpRS (Li and Carter 2013), shedding new light on the general problem of intramolecular signaling or allostery (Carter et al. 2017; Chandrasekaran and Carter 2017; Chandrasekaran et al. 2016).

Mechanistic studies on intact TrpRS had previously identified a profound intramolecular coupling, ΔΔG^‡ ~ −5 kcal/mole necessary for catalytic assist by the active-site Mg²⁺ ion (Weinreb and Carter 2008) and achieving full catalytic proficiency by the full-length enzyme (Weinreb et al. 2012, 2014; Carter et al. 2017; Weinreb and Carter 2008). A five-way coupling interaction, ΔΔG^‡ ~ − 5 kcal/mole, was also measured between four residues in an allosteric switching region 20 Å from the active-site metal that mediates the shear involved in domain movement during catalysis (Kapustina et al. 2007). A related study (Weinreb et al. 2014) confirmed that the same coupling energy was used in the transition state to enforce the specific selection of cognate tryptophan vs non-cognate tyrosine. Thus, the modular thermodynamic cycle provided a key link connecting the long-range coupling observed previously directly to the domain movement: the four switching side chains (I4, F26, Y33, and F37), the Mg²⁺ ion, and both domains all move coordinately in the transition state (Carter et al. 2017; Carter 2017).

Modular Engineering of HisRS

Three conserved motifs are recognized in Class II aaRS: Motif 1 and Motif 2 compose the HisRS Urzyme. The third, Motif 3, however, lies well outside the Urzyme. It is separated by a long and variable insertion domain, C-terminal to the Urzyme, much as the long and variable CP1 insertion interrupts the Class I Urzyme. Exploratory modular engineering of interactions in the Class II HisRS yielded several intriguing observations (Li et al. 2011). (i) Motif 3 could be fused together with the HisRS Urzyme to produce a module whose catalytic activity is intermediate between that of the Urzyme and that of Ncat, the HisRS catalytic domain containing both Motif 3 and the insertion domain (Augustine and Francklyn 1997). (ii) Catalytic proficiency of the Motif 3-supplemented HisRS Urzyme is further enhanced by adding six additional residues N-terminal to the Urzyme (Li et al. 2011). (iii) The six-residue N-terminal fragment functions synergistically with Motif 3. Effects of the five modules estimated by regression methods from all of the measurements (Fig. 7) distinctly resemble those evaluated on the basis of more thorough investigations of Class I constructs.

2.4 Middle Codon-Base Pairing: A New Phylogenetic Distance Metric

Bi-directional Genetic Coding Left a Detectable Trace in Contemporary Sequences

Sense/antisense alignment of coding sequences from different protein families introduced a new, phylogenetic distance metric—the percentage of middle codon bases that are complementary in all-by-all in-frame bi-directional alignments of multiple sequence alignments from the two families. As an example, aligning the TrpRS Class I Uryzme against the HisRS Motif 2 (Pham et al. 2007) revealed that the region of quite extensive codon-anticodon complementarity identified by Rodin and Ohno could be extended to include ~75% of both Urzyme sequences, provided that the first and third codon bases were excluded. Outside regions of very high conservation as found in the Class-defining signatures of the aaRS, a transient ancestral use of dual strand coding, followed by an extended period of adaptive radiation would rapidly degrade the complementarity of the two strands. The highly conservative nature of the genetic code, together with wobble property of the third codon base (Crick 1966) mean that loss in the middle-base pairing occurs much more slowly as sequences diverge than that in the first codon bases on each strand, each of which is opposite a wobble base on the opposite strand (Fig. 8a). The trace of bi-directional coding ancestry can thus be recovered by structurally-informed middle codon-base alignments of sufficient numbers of contemporary sequences (Fig. 8b) (Chandrasekaran et al. 2013). To wit, if the number of sequences aligned is sufficiently high (~10⁴ comparisons in (Chandrasekaran et al. 2013), the standard error of the mean is reduced to a tiny fraction of the differences between the pairing frequencies of Class I vs Class II alignments and those (0.25) expected under the null hypothesis that one base in four would be complementary.

Elevated Codon Middle-Base Pairing in Multiple Antiparallel Alignments Between Different Class I and II aaRS Coding Sequences Occurs Generally Throughout all aaRS Superfamilies

The significance of the published study of middle codon-base pairing (Chandrasekaran et al. 2013) raises a potential question because the statistics were accumulated for alignments of a Class IC (TrpRS) with a Class IIA (HisRS) synthetase. Neither of these synthetases was likely to have been among the earliest to appear. To establish the significance of the middle codon-base pairing distance metric, we therefore extended this analysis to include eleven aaRS, balancing the three subclasses by including six from Class I (TrpRS, TyrRS, LeuRS, IleRS, GluRS, GlnRS) and five from Class II (HisRS, ProRS, AspRS, AsnRS, PheRS; N. Chandrasekaran, personal communication). The alignments included 64 amino acids surrounding the PxxxxHIGH and KMSKS sequences in Class I aaRS and the Motif 1 and 2 sequences in Class II aaRS in (Chandrasekaran et al. 2013) to enhance confidence. The trace of ancestral bi-directional coding remains significant.

This extended database samples multiple comparisons between all subclasses within each Class, and hence include pairs of aaRS that presumeably appeared at different times along the evolution of the code. The statistical structure of this new database is shown in Fig. 9. The bi-directional alignments add an average of ~0.07 ± 0.007 to the fraction of codon middle-base pairing over those within the same aaRS Class. The difference between within- and between-classes accounts for ~60% of the variance in observed pairing (R² = 0.60), and the Student t-test probability for a ratio of the slope to its standard error as large as 10 is ~10^–14.

Ancestral Sequence Reconstruction Extends the Phylogenetic Evidence for Bi-directional Coding Significantly Backward in Time

Ancestral gene reconstruction (Benner et al. 2007; Stackhouse et al. 1990) has become broadly used to resurrect ancestral enzymes (Gaucher et al. 2008) and signaling proteins (Dean and Thornton 2007; Thornton 2004; Hanson-Smith et al. 2010; Andreini et al. 2008; Ortlund et al. 2007; Bridgham et al. 2006; Thornton et al. 2003). Given the evidence for significant residual codon middle-base pairing in contemporary Class I and Class II sense/antisense alignments (Fig. 8b), it was of interest to extend the technique to the quantitative comparison of distinct gene families whose evolutionary descent might have been tightly coupled by bi-directional genetic coding at the origins of translation. That prediction led to the expectation that reconstructed node sequences of both superfamilies might exhibit increased codon middle-base pairing as reconstructed nodes from each family are aligned in opposite directions. This test is distinct from the construction of phylogenetic trees from multiple sequence alignments of related proteins, because it compares separate reconstructions of distinct families carried out independently and aligned only after the nodes have been reconstructed. The resulting appearance of increased codon middle-base pairing (Fig. 8c) is therefore a significant, orthogonal verification of the Rodin-Ohno hypothesis (Chandrasekaran et al. 2013).

Codon Middle-Base Pairing May Contain Evidence for Very Early Stages of Genetic Coding

The breadth of the histograms in Fig. 9 and consensus subdivisions of the two aaRS Classes into parallel subclasses, one large and two small, suggest that further examination of the middle-base pairing metric may eventually provide clues about the order in which pairs of aaRS speciated, and hence the order in which amino acids appeared in coding relationships. We constructed putative phylogenetic trees from the aligned amino acid sequences of eleven aaRS (six Class I and five Class II; Fig. 10a) and from the middle base-pairing distance metric (Fig. 10b) to illustrate this possibility. Significantly, there is only one significantly lower middle codon-base pairing metric among the all-by-all comparison of the subclasses—subclasses Ib and IIc appear to be more distantly related than all of the other subclasses. Thus, the distance metric implies comparable distances between all aaRS subclasses of each class to those of the other. Although based on partial data, this analysis is nevertheless interesting because it suggests ancestral genes in which the two principal subclasses are swapped (Fig. 10b): strands of the presumptive ancestral gene encoded ancestral Class Ia Ile-like and Class IIb Asp-like protozymes. Similarly, the next most prominent middle-base pairing metric relates sequences of Class IIa ProRS those of Class Ib GlnRS. Further work in this direction is in progress, and will require developing improved analytical tools for using the new distance metric, along with ways to deal with the ancestral sequence reconstructions built from amino acid alphabets of decreasing size (Wills 2016; Markowitz et al. 2006; Wills et al. 2015).

These data suggest that we now potentially have the tools to address directly the question of which stepwise bifurcations were actually involved, and in which order, leading to the universal genetic code. That code is one of a tiny number of near optimal codes that have the dual properties of high redundancy and resistance to mutation (Freeland and Hurst 1998). It therefore must have been discovered by a process of feedback-constrained symmetry-breaking phase transitions, or “boot-strapping”. The underlying necessity for these transitions is discussed in detail elsewhere (Carter and Wills 2017; Wills and Carter 2017). Among the first of these symmetry-breaking transitions relevant to the genetic code was the aaRS class division that divided the amino acids into two distinct classes, as discussed in Sect. 4.

3 Structural Biology of Ancestral Synthetases

Most of what we know about the structures of Urzyme and Protozyme models for ancestral aaRS has been inferred from crystal structures of full-length enzymes. Thus, for example, all structures depicted in figures herein were prepared by excerpting relevant coordinates from the corresponding pdb files. However, work has begun on the challenging task of providing more reliable and detailed structural data.

TrpRS Urzyme Is a Catalytically Active Molten Globule

Attempts to crystallize Urzymes, either alone or as maltose-binding-protein fusions, has not yet been successful. It has, however, been possible to prepare isotopically labeled samples of active TrpRS Urzyme. Preliminary ¹⁵N-¹H HSQC spectra from those samples supported an unexpected, but not unprecedented conclusion—the TrpRS Urzyme is not a folded protein, but has many of the properties expected from a catalytically active molten globule (Sapienza et al. 2016). The HSQC spectrum has a reasonable dispersion of values in the ¹⁵N dimension, but, even in the presence of ATP and a non-reactive tryptophan analog all ~80 peaks are contained between 8–8.5 ppm in the ¹H dimension, which is characteristic of proteins that are not fully folded. The conclusion that it is probably a molten globule is reinforced by the fact that the temperature dependence of CD spectrum exhibits cold denaturation, and that Thermaflour measurements of thermal melting in the presence of Sypro Orange dye exhibit high fluorescence at all temperatures below ~45 °C, above which fluorescence decreases non-cooperatively over a range of ~30 °C. Both cold denaturation and high fluorescence over broad temperature ranges in the presence of Sypro Orange are characteristic of limited tertiary structure as in molten globules (Sapienza et al. 2016).

The likely possibility that the TrpRS Uryzme is a catalytically active molten globule has considerable significance in light of the work of Hilvert (Pervushin et al. 2007) and (Hu 2014), showing that the native chorismate mutase dimer and an engineered monomeric form that is a molten globule exhibit comparable rate accelerations—and hence transition state stabilization—by distinctly different strategies. The high, positive TΔS^‡ implies even greater enthalpic contribution to transition state stabilization. Thus, thee molten globular catalyst achieves a substantially higher – ΔH^‡ to overcome the entropic cost of restricting the molten conformation of the catalyst when it binds to the transition state. The additional flexibility of the molten globular ensemble appears enable it to wrap more tightly around the transition state configuration of the substrate than can the properly folded native form of the enzyme.

The Potential Manifold for Catalytic Activity by Poorly Structured Polypeptides May Thus Be Much Larger Than Was Thought Possible

Two molten globular polypeptides therefore exhibit high rate accelerations. At least one of these does so by forming substantially tighter bonds in the transition state than are possible with the native enzyme. This plasticity opens the possibility that many similar structural ensembles might act catalytically, and hence that a wider range of polypeptides might exhibit catalytic activity.

Peptide Catalysts Are Far Superior to Ribozymes

The superiority of peptide catalysts is widely recognized. However, because it is so much easier to generate and select catalytic aptamers from RNA than it is from protein combinatorial libraries, it is unclear that this wide recognition comes with an appreciation of just how superior polypeptide catalysts are, in principle. Wills (2016) has compared the combinatorial possibilities for making an active site with proteins to those available for ribozymes. The combinatorial advantage of protein active sites arises because amino acids are only half the volume of nucleotide bases, meaning that contact to a transition state can arise from a greater number of amino acids. This advantage is compounded by the fact that there are five times as many choices of amino acids. As a result, proteins have an advantage somewhere between a million- and a billion-fold over RNA, which is what is observed experimentally (Carter and Wills 2017).

Hecht’s work (Patel et al. 2009; Moffet et al. 2003; Kamtekar et al. 1993) has demonstrated that a large proportion of molecules within patterned combinatorial libraries actually do form molten globules. The relatively low free energy barriers associated with assuming catalytically competent conformations, together with the vastly enhanced abilities of amino acid side chains to engineer nanoscale chemistry argue that catalytic activity is likely to arise and evolve much more rapidly from populations of peptides than from libraries of RNA. Thus, demonstrating that catalytic proficiency does not require the evolution of properly folded proteins represents a considerable expansion in their potential catalytic repertoire.

4 The Basis for the AARS Class Division

Because they activate amino acids by catalyzing adenylation by ATP, the aminoacyl-tRNA synthetases are arguably among the earliest enzymes to emerge during the origin of life. Absent catalysts, amino acid activation is both the slowest kinetically and most irreversible thermodynamically of the chemical reactions necessary for protein synthesis (Carter et al. 2014). The former distinction means that activation is ~1000 times slower than acyl transfer to tRNA or peptide bond formation from activated intermediates and represents the principal kinetic barrier to making peptides in a pre-biotic context. The latter means that amino acid activation is one of the hardest reactions in biology to drive to completion. It is probably not accidental that it became driven by ATP hydrolysis, which can deliver an additional free energy pulse once the pyrophosphate liberated by amino acid activation is subsequently hydrolyzed, assuring that activation goes to completion.

That two distinct protein superfamilies emerged to couple amino acid activation to ATP hydrolysis represented a conundrum that remained unanswered until a quite recent investigation connecting coding properties of tRNA bases with the physical chemistry of amino acid side-chain phase transfer and protein folding equilibria (Carter and Wolfenden 2015, 2016; Wolfenden et al. 2015) provided the first clues to a possible answer (see Fig. 2). One puzzling aspect of dividing the 20 canonical amino acids into two distinct groups is that the resulting classes appear to have quite similar diversity in their representation of the various physical chemical properties. Subclass B activates Glu, Gln, and Lys in Class I and Asp, Asn, and Lys in Class II. Subclass C activates Trp and Tyr in Class I and Phe in Class II. The similar diversity within each class leaves open the possibility that the two synthetase Classes appeared sequentially and not simultaneously. Although most authors have been reluctant to comment on their order of appearance (Woese et al. 2000), several have argued for a sequential appearance (Safro and Klipcan 2013; Klipcan and Safro 2004; Smith and Hartman 2015).

4.1 Class I, II aaRS Have Highly Interdependent Active-Site Constructions

Sequence conservation within the synthetase active sites furnishes the strongest evidence that the two Classes appeared simultaneously and not sequentially (Carter et al. 2014; Carter 2015). Functional residues in each site—those whose functional groups directly influence the chemistry of the two substrates, as opposed to side chains within the conserved signatures that interact with the rest of the protein—are drawn entirely from the set of amino acids activated by the opposite aaRS Class (Fig. 11). This phenomenon is especially conspicuous for the Class-defining signature residues. For example, seven residues from the HIGH and KMSKS motives of Class I active sites interact with the ATP substrate; whereas the two remaining hydrophobic, Class I, I and M residues, respectively, coordinate movement of the two signatures because they are embedded in a hydrophobic core of the anticodon-binding domain.

Although further work on this question is certainly worthwhile, there appear to be functional reasons why active-site residues are deployed quite differently in each Class. Although these differences have not been delineated systematically, they appear to produce dissimilar transition-state stabilization mechanisms (Perona and Gruic-Sovulj 2013; Zhang et al. 2006). With some exceptions—TrpRS residue D132 is a Class II amino acid conserved in the amino acid substrate binding sites of several Class I aaRS—the residues that obey this particular asymmetry are located in the respective ATP binding sites, and their functional differentiation is likely to be related to functional differentiation in the mechanism for ATP activation. Class I aaRS appear to use a dynamic Mg²⁺ ion that moves with the PPi leaving group bound by the KMSKS loop during the transition state (Carter et al. 2017) and appear to stabilize additional negative charge in a dissociative transition state involving an α-metaphosphate (Carter et al. 2017). Class II aaRS appear to stabilize a pentavalent α-phosphoryl group transition state with multiple Mg²⁺ ions that are bound directly by protein residues in a manner reminiscent of the two-metal transition state stabilization of polymerases (Steitz and Steitz 1993).

Some variation in these patterns occurs in eukaryotic synthetases, which are much more highly differentiated than bacterial aaRS and have adapted their catalytic mechanisms to accommodate and perhaps to sustain the accumulation of modules, called physiocrines (Guo and Schimmel 2013; Guo et al. 2010), with additional functions, whose selective advantages have been proposed to require modifying the active-site configurations (Yang et al. 2004, 2007).

These putative mechanistic differences are consistent with the differential use of Class II histidine, asparagine, lysine and serine to stabilization of a (PO₃ ⁻) metaphosphate transition state by Class I aaRS, and of Class I arginine—stabilization with Mg²⁺ of pentavalent phosphoryl transition state assisted by glutamic acid coordination of Mg²⁺ by Class II aaRS. As the active sites are almost certainly the oldest parts of an enzyme, it seems highly unlikely that either aaRS Class could ever have managed without the other because of the necessity to provide activated amino acids from the opposite set for their own translation.

Mechanistic differentiation as outlined here may have deeper evolutionary significance in light of a series of asymmetries at primary, secondary, and tertiary structural levels. These are described in Sects. 4.2, 4.3 and 4.4.

4.2 Amino Acid Side Chain Volume May Underlie the Class Distinction

It seems reasonable to seek a basis for the striking division between the two amino acid classes from among the various physical descriptors that differentiate amino acid chemical behaviors. The obvious diversity within each class complicates the question. Given a matrix with one amino acid per row and columns giving its class and candidate descriptors, one can compare the various linear models relating properties to amino acid Class. Most proposed predictors are corrupted by attempts to impose correlations with the buried (or exposed) surface areas in proteins. Three predictors stand apart from this difficulty: the “polar requirement” (Woese et al. 1966), and the phase transfer free energies for water-to-cyclohexane (Wolfenden 2007; Gibbs et al. 1991; Wolfenden et al. 1979a; Wolfenden et al. 1979b) and vapor-to-cyclohexane (Radzicka and Wolfenden 1988). The former resulted from an ingenious attempt to test the hypothetical correlation between an amino acid property and the codon table. The resulting scale was derived from paper chromatography of the amino acids in solutions of varying content of dimethylpyridine, to alter the mobility in accordance with the phase transfer behavior. However, that scale is highly idiosyncratic and cannot be recapitulated without an extensive investigation into the properties of paper chromatography, which is rarely if ever used today. In contrast, both the latter measures represent pure physico-chemical equilibria unrelated to possible interactions either with carbohydrates in the paper support or with nucleic acids (Fig. 12).

The three different properties are not linearly independent. The polar requirement is uncorrelated with the vapor-to-cyclohexane equilibria (R² = 0.06; P = 0.26) but well-correlated with the water-to-cyclohexane equilibria (R² = 0.62; P < 0.0001). However, the two phase transfer equilibria themselves are uncorrelated (R² = 0.01; P = 0.69). Thus, they are distinctly different metrics, whereas the polar requirement resembles the hydrophobicity measured by the water-to-cyclohexane transfer equilibrium.

The second relevant point is how the three values correlate with the degree of side chain exposure in folded proteins measured by the solvent accessible surface area, ASA. That metric is, itself, subject to uncertainty as discussed elsewhere (Carter and Wolfenden 2015; Wolfenden et al. 2015), where it is argued that the values published by Moelbert (Moelbert et al. 2004) represent the least ambiguous values. Moreover, amino acids cysteine and proline violate all characterizations of this sort, owing to alternative influences—exposure in turn segments, coordination of metals and disulfide linkages—that lead to highly variant distributions between surface and core. Given those assumptions, and by a substantial margin, the best model for the ASA values is achieved by a linear combination of water-to-cyclohexane and vapor-to-cyclohexane free energies (R² = 0.94; all P < 0.001). The polar requirement performs less well in combinations (Fig. 13), indicating that the information in the polar requirement is redundant with that in the transfer free energy from water to cyclohexane. Thus, the vapor-to-cyclohexane transfer free energy provides new, complementary information, allowing a nearly complete prediction of ASA.

The original purpose in developing the polar requirement was to account for regularities in the genetic code. It appears, however, that tRNA identity elements are more reliably related in detail to the phase transfer free energies (Carter and Wolfenden 2015, 2016) than to the polar requirement value.

4.3 tRNA Acceptor Stem and Anticodon Have Independent Coding Properties

In a paper that now appears increasingly prescient, Schimmel et al. (1993) argued that the dual domain structures of aaRS and tRNAs and experimental demonstrations that many aaRS could specifically aminoacylate tRNA acceptor stems in the absence of the anticodon stem loops implied an earlier phase of genetic coding. They proposed that aaRS catalytic domains and tRNA acceptor stems may have implemented an “operational RNA code” that enabled them to begin to align aminoacyl-acceptor stems according to an ancestral messenger RNA (Henderson and Schimmel 1997). The anticodon stem-loop and corresponding binding domains in the synthetases were assimilated later.

The more recent demonstration that aaRS Urzymes could acylate tRNA complemented the experimental acylation of acceptor stems (Francklyn et al. 1992; Francklyn and Schimmel 1989, 1990), reinforcing the suggestion of Schimmel, et al., and highlighting the question of what, specifically, might that operational code have consisted? To answer this question, various potential properties of the 20 canonical amino acids were assembled into a table with one line per amino acid. Each property was listed in a separate column. Separate tables included additional columns representing acceptor stem and anticodon bases forming identity elements (compiled by Giegé (Giegé et al. 1998)) according to a binary code using one bit for whether a base is a purine (−1) or a pyrimidine (1) and another bit for whether its Watson-Crick base pairing formed three (1) or two (−1) hydrogen bonds. Regression models were then constructed for each property as a dependent variable using the coding element columns as independent variables (Carter and Wolfenden 2015).

Not surprisingly, all such models provided excellent correlations with each physical property if sufficiently many coefficients were used. To differentiate “predictive” models from models using sufficiently many coefficients that they overfitted the noise, two non-canonical amino acids, selenocysteine (Sec) and pyrrolysine (Pyl), outside the training set were used for cross-validation. There was a clear distinction between models capable of predicting the properties of the two amino acids in the test set, and those that did so poorly. “Predictive” models had a small variance in predicting the test set (Sec, Pyl). They differed distinctly, depending on whether the identity elements used as independent variables came from the acceptor stem or from the anticodon (Fig. 14). Bases in the acceptor stem provide uniquely predictive codes for the side-chain size, whether or not it is branched at the β-carbon, and whether or not the side chain has a carboxyl group. Most other side-chain properties, notably including the hydrophobicity, are specified by the anticodon (Carter and Wolfenden 2015).

The unique and restricted coding properties of the acceptor stem provide substantive support for the proposal that an “operational RNA code” preceded the universal genetic code carried by the anticodon bases. Details of the acceptor stem code suggest in addition that the properties most important for that code were size, β-branching, and carboxylate side chains. In turn, those properties argue that genetic coding began before it became useful to encode side chains necessary to form hydrophobic cores, and hence before coding specified folded tertiary structures (Carter and Wolfenden 2015). In fact, the central features of the acceptor stem code specify requirements for forming extended chain β-structures, like those identified by modeling interactions between peptide β-structures and RNA (Carter 1975; Carter and Kraut 1974). These requirements suggest a selective advantage that could have favored the emergence of such an operational code from a pre-existing population of oligopeptides and oligonucleotides that interacted according to a direct, stereochemical code based upon mutual structural complementarity. Moreover, it is consistent with the notion, elaborated in Sects. 2.4 and 5.B, that ancestral synthetases, and especially protozymes were coded using a reduced alphabet, leading to “statistical proteins” (Wills 2016; Vestigian et al. 2006; Woese 1967, 1969).

4.4 Bi-directional Coding Implies Two Interpretations of the Same Genetic Information

To test the Rodin-Ohno hypothesis directly, we adapted the Rosetta multistate design algorithm (Leaver-Fay et al. 2011) to craft polypeptide sequences to stabilize two alternative backbone configurations—Class I and Class II protozymes— simultaneously and subject to the constraint that amino acids selected to stabilize one backbone have codons complementary to those of amino acids at the corresponding position on the other strand. Those constraints enforced bi-directional coding and produced one gene from which we could express a Class I Protozyme in one orientation and a Class II Protozyme in the other orientation Fig. 15 (Martinez et al. 2015).

Contemporary aaRS genes are obviously coded uni-directionally (there is, however, evidence that bi-directional coding might have survived to contemporary organisms in isolated cases (Carter and Duax 2002; LéJohn et al. 1994a, b; Yang and LéJohn 1994). The degree of middle codon-base pairing in sufficiently detailed, all-by-all comparisons may therefore allow two different mechanisms to be distinguished: (i) strand specialization, in which the two strands of daughter genes developed mutations that eliminated bi-directional coding in order to achieve sufficiently improved fitness (Fig. 16a), and (ii) adaptive radiation of bi-directional genes that would have preserved high middle codon-base pairing until more recent times (Fig. 16b). At what point the strand specialization actually occurred along the sequence of bifurcations during code expansion should have left distinguishable signatures in patterns of the middle codon-base pairing distance metric.

Bi-directional coding means that the two aaRS Classes are derived from alternative interpretations of the same ancestral genetic information, much as in visual puzzles with complementary interpretations of figure and ground (Fig. 17). The Watson-Crick base-pairing rules and the repeating twofold symmetry relating backbones of opposite nucleic acid strands means that the two strands of the designed Protozyme gene contain only one set of unique information, represented in complementary forms by either strand. The opposite strand has no additional information! Yet that information can support two entirely different interpretations with similar functions, depending on how it is read. This unexpected duality unifies the two aaRS superfamilies. Unification is commonly sought by physicists, but is very unusual in structural biology.

5 Inversion Symmetries in Structure and Function Maximally Differentiate the Two aaRS Classes

Anomalies described in Sect. 4 appeared at first to be unrelated curiosities. In fact, however, they all assume coherent interpretations as inversion symmetries in the structural and functional implications of bi-directional coding for the organizational levels —primary, secondary, tertiary—of the familiar Linderstrøm-Lang (Linderstrøm-Lang 1952) hierarchy. Moreover, these inversion symmetries may have significantly impacted the emergence of genetic coding (Carter and Wills 2017). This section incorporates these relationships into a unified framework: multi-level molecular disambiguation.

The Genes of Bi-directionally Coded Ancestral Class I and Class II aaRS Were as Different as Possible from Each Other

Although the sugar-phosphate backbones of two complementary strands of a nucleic acid can be interconverted by twofold symmetry operations, their sequences can be interconverted only via the complementarity operation of base-pairing. This means that the mutational path from one strand to its complement is as long as possible: bi-directionally coded genes are maximally differentiated with respect to mutation, and it is essentially inconceivable that random mutational events could achieve that interconversion.

Primary Amino Acid Sequences of Ancestral Class I and Class II aaRS Were as Different as Possible from Each Other

Primary structures of proteins are intimately related via the genetic code to their nucleic acid (RNA or DNA) coding sequences. For this reason, ancestral, bi-directionally coded aaRS ancestors are maximally differentiated from one another.

Secondary Structures of Class I and II aaRS Were Similar to Each Other

Unlike primary and tertiary structures, the ancestral Class I and II aaRS secondary structures were very likely similar, with crucial differences outlined in the next paragraph. Secondary structural similarity arises from the fact that formation of α-helical and extended β-structures is driven largely by periodic patterns of similar side chain properties—heptapeptide repeats of non-polar side chains forming α-helices and alternating dipeptide patterns forming extended β-structure. Such periodicities reflect across from one coding strand to the other [See Fig. 6b of Ref. (Chandrasekaran et al. 2013)].

Tertiary Structures of Proteins Coded by Bi-directional Genes Have the Interesting Property of Being in a Real Sense Inside Out, One to the Other

A curious feature in the organization of the genetic code (Zull and Smith 1990) means that amino acids that are confined to cores of folded proteins have codons whose anticodons encode residues invariably found on the surface [See Fig. 6a of Ref. (Chandrasekaran et al. 2013)]. Thus, whereas secondary structures in bi-directionally coded pairs of proteins are largely reflected across the two coding strands, solvation patterns of their respective side chains are inverted. Surfaces of helices and strands that are exposed in folded structures of one Class are buried in those of the other.

Use of Side Chains from Opposite Classes Maximizes Differentation of Catalytic Mechanisms

The active site differentiation (Fig. 11) results in significant mechanistic disambiguation of the active-site chemistries of Class I and II aaRS. Mutations that might lead to interconversion from one mechanism to the other would therefore be most likely to be lethal, adding a measure of secuity to the mechanistic integrity of the Class.

Substrate Recognition Differentiates Large from Small Amino Acid Side Chains

The only significant aspect of the canonical 20 amino acids for which there is a statistically significant difference between the two classes is the size of the side chain (Carter and Wolfenden 2015; Wolfenden et al. 2015). It would seem more than coincidental that amino acid size is also the primary source of differentiation in the tRNA acceptor stem operational code (Carter and Wolfenden 2015, 2016). Side chain volume thus appears to underlie the initial differentiation between aaRS substrates that eventually enabled the development of the current genetic code.

6 Evolutionary Implications

The genetic code represents one of the deepest unsolved puzzles associated with the origin of life. An important possible reason why it has remained such an outstanding problem is that much of the interdisciplinary community concerned with the origin of life has been preoccupied by the RNA World hypothesis, under which the code remains an even more challenging enigma than it needs to be. Under that hypothesis, the code was “discovered” by ribozymes seeking to improve their severely limited catalytic potential and was perhaps assisted by stereochemical complementarity of nucleotide triplets whose structures were complementary to those of amino acids, for which they eventually became (either) codons or anticodons (Yarus 2011a; Yarus et al. 2009). The typical argument (Koonin 2011) is that amino acids were first recruited by ribozymes as “co-factors” that enhanced the diversity and proficiency of ribozymes. As their superior catalytic functions became manifest, increasingly polymerized forms of amino acids arose through some form of selection, leading to the translation system now used throughout biology. One of many substantial problems with this narrative is that there really is no rational mechanism for bootstrapping the code into existence in an RNA world (Carter and Wills 2017).

This review promised a dialectical survey of ways in which recent advances in understanding the evolution of the aminoacyl-tRNA synthetases have tended to refute the contention that these enzymes “…did not shape the codon assignments”. A substantial number of unexpected results have been put in evidence since that judgment was formulated. Do those observations contribute to “a credible scenario for the evolution of the coding principle itself and the translation system”? Remarkably, the answer appears to be substantively affirmative as outlined in greater detail elsewhere (Carter and Wills 2017; Wills and Carter 2017). In brief, each of the unexpected oddities arising from recent aaRS research now can be seen as necessary and sufficient vestiges of a bootstrapping process that began deep in a very primitive RNA/peptide world that prefigured relationships in Fig. 2 at an elemental level. Its built-in and robust reflexivity enabled a massive acceleration of the search for a near optimal genetic code. Further, these results were accomplished using a variety of effective new tools and ways of thinking, with which to explore the many interstices remaining to be explored.

Assembling the pieces reviewed here into a coherent refutation of the RNA World narrative begins with the conclusion that contemporary protein structures represent an archive of their origins and evolutionary expansion.

6.1 Protein Structures Were Probably Not Completely Overwritten But Left an Interpretable Archive of Their Evolutionary Origin in Bi-directional Genetic Coding

Successive experimental deconstruction of both Class I and Class II aaRS reveals parallel structural and catalytic hierarchies. Catalytic domains, Urzymes, and Protozymes represent increasingly deeply, broadly conserved motifs. The relative ease in identifying them and the robustness of their experimental characterization compose a substantial string of evidence that the hierarchies apparent in Figs. 3 and 4 represent not a palimpsest of an RNA World (Benner et al. 1989), but a legitimate archive (Danchin 2007)—evident in the contemporary structures—from which we have been able to read out, and reconstruct reasonable models for successive stages in their evolution.

Evolutionary reconstruction is kin to similar scientific problems, such as chemical and enzyme kinetic mechanisms, that present formidable barriers having to do with limitations placed by time. Kinetic mechanisms imply limiting structures, transition states, which by virtue of their extremely short lifetimes, cannot normally be directly detected. The inability to re-play the tape of evolution, on the other hand, creates analogous problems. In both cases, direct study of the objects of greatest interest is either difficult or impossible. These time-related barriers, in turn, dictate the logical framework necessary to progress. Koch’s postulates concerning infectious disease were an early expression of this framework, and these were re-formulated by Fersht (1999) to shape the evidence for a particular intermediate in a reaction path. This logical process entails three elements: characterization of the putative intermediate, demonstration that it can be formed from preceding structures fast enough to lie on the path, and demonstration that it can react fast enough to lie on the path. By analogy, a legitimate evolutionary intermediate must be identified, and prepared. Then, plausible evolutionary changes must be outlined—and if possible tested—to account for its initial appearance and its subsequent conversion to the next intermediate in a reasonable timeframe.

The aaRS superfamilies behave consistently with this logical framework. Like Matryoshka dolls, elemental Class I and II active sites both lie within their protozymes, which form the ATP binding sites. Protozymes themselves compose a bit less than half the Urzymes, which contain the nuts and bolts of the catalytic domain. The function of the TrpRS Urzyme within the full-length enzyme is, however, entangled with coordinated motions of the CP1 and ABD domains (Kapustina et al. 2006, 2007; Li et al. 2015; Budiman et al. 2007; Kapustina and Carter 2006). Recent combinatorial perturbation studies have implicated these motions in long-range allosteric effects crucial to both catalysis (Weinreb et al. 2009, 2012) and specificity (Weinreb et al. 2014; Li and Carter 2013; Carter et al. 2017) in TrpRS. The synthetase phylogenies therefore represent rich and, as yet only minimally tapped, resources of information about the evolutionary development of catalysis, specificity, and allostery.

Beginning with the audacious proposal of Rodin and Ohno (1995) and culminating with the experimental demonstration of a bi-directional gene for the Class I and II Protozymes (Martinez et al. 2015), the trajectory of aaRS research has produced diverse experimental and bioinformatic confirmations of the predictions of bi-directional coding (Fig. 1). Substantive validation of the unification of the two synthetase Classes implies a need to re-annotate proteomic databases. In particular, if, as appears to be the case, a bi-directional synthetase protozyme gene preceded most of the proteome, then phylogenies (Wolf and Koonin 2007; ; Aravind et al. 1998, 2002; Leipe et al. 2002; Wolf et al. 1999) that imply that the Class I synthetase superfamily branched off a different root quite late in the emergence of the proteome cannot also be correct unless modified in important ways to reflect the substantially finer modularity of proteins in general and the staged appearance of different modules.

For the moment, suffice it to say that the direct evidence for the bi-directional coding hypothesis appears exceptionally strong both for Protozymes (Martinez et al. 2015) and for Urzymes (Li et al. 2013). Section 3 developed the implications of bi-directional coding for the differentiation of the genes themselves and their implications for primary, secondary, and tertiary structures of the corresponding synthetases, and thence for their catalytic and coding functions. In the following section, we see how these aspects of synthetase phylogeny and structural biology, viewed coherently as outlined in Sect. 5, also in fact fulfill important requirements for bootstrapping complexity from simplicity.

6.2 Bi-directional Coding Was Probably Essential to Stabilize the Emergence of Translation

From the standpoint of information processing, genetic coding differs fundamentally from replication. It is, arguably, one of the most significant and puzzling among the transformations that produced biology from chemistry. It is perhaps the key event that enabled the creation of a multiplicity of sufficiently tunable catalytic activities to synchronize the rates necessary for cellular metabolism, Sect. 2.1. It seems surprising that (Woese et al. 2000) would have summarily dismissed the possibility that the evolution of the aminoacyl-tRNA synthetases—the executors of the genetic code—had much to do with the development of the code itself.

The alternative argument, that synthetase evolution actually was the sine qua non of what generated the code, has been articulated in considerable detail elsewhere [(Carter and Wills 2017; Wills and Carter 2017) and refs cited therein]. Key aspects of that argument are as follows:

(i)
Primary structures generated by bi-directional coding are maximally different from one another, hence cannot be “fused” via functional mutants.
(ii)
Coding relationships in tRNA acceptor stems and anticodons are consistent with at least two distinct stages during which synthetase-tRNA recognition participated in indirect (i.e., “genetic”) coding (Carter and Wolfenden 2015, 2016). The earlier stage is eminently consistent with a very small amino acid alphabet consisting of one or at most two bits. Moreover, this result implies the tetrahedral network (Fig. 2) connecting four nodes, two in the nucleic acid world—tRNA (the programming language) and mRNA (the programs) to two nodes in the protein world—amino acid physical chemistry and protein folding (Carter and Wolfenden 2015, 2016; Wolfenden et al. 2015).
(iii)
A two-bit alphabet competing with anything pre-existing in any RNA coding world having higher sophistication would have been rapidly eliminated by purifying selection because it would degrade more sophisticated messages.
(iv)
(a–c) imply that genetic coding must have been built from scratch in an implementation executed by protein aaRSs.
(v)
Bi-directionally coded aaRS Protozymes and Urzymes represent a credible origin of the reflexivity necessary for efficiently bootstrapping the full genetic code into existence.

This bulleted list correlates with the elements of inversion symmetry relating the two aaRS Classes suggesting that the fundamentals of aaRS evolution (Sect. 3) closely match requirements for the emergence of genetic coding (Sect. 5.B(a)–(e)).

Bi-directional Coding Is Unexpected Because It Limits Genetic Diversity

An indeterminate, but presumably significant fraction of all possible mutations that might enhance the fitness of both products from a bi-directional gene are forbidden by the complementarity constraint. It also appears likely that many such mutations would also decrease the fitness of the translated product from the opposite strand. Recent unpublished computational analysis (Silvert and Simonson 2016) suggests that the cost of bi-directional coding may be smaller than previously thought. Computational construction of bi-directional genes for all pairwise combinations of 500 pfam domains revealed that the number of pairs whose bi-directional genes in all 6 relative reading frames were homologous to consensus sequences from contemporary sequence alignments was unexpectedly high. Thus, the inversion symmetry of the universal genetic code observed by Zull and Smith (Zull and Smith 1990) may actually have played a key role in the eventual selection of the universal genetic code by optimizing the number of potentially functional bi-directional genes.

In any case, co-linear bi-directional coding does exact a price. The quest for diversity is severely limited by constraining both strands of a gene to have functional interpretations. That price must have been paid for by strong contemporary selective advantages. Two definitive properties—gene linkage (Pham et al. 2007) and gene differentiation (Carter and Wills 2017; Wills and Carter 2017)—together with the interdependence of all aaRS genes, may have compensated for this limitation while error rates were very high, and consequently made bi-directional coding an inevitable requirement for the emergence of coded protein synthesis.

Bi-directional Ensures Coexpression of Expressed Genes

It seems reasonable that the two classes of amino acid substrates differed sufficiently that coded protein synthesis could not have been launched without (at least) two specialized kinds of aaRS. The distinction between amino acid substrates of Class I and II aaRS is closely related to the roles those amino acids play in protein folding and the apparently earlier distinctions based on size, β-branching, and carboxylate side chains (Carter and Wolfenden 2015, 2016) are conspicuously appropriate for defining secondary structures in coded peptides. Bi-directional coding of amino acid activating enzymes with two specificities would have created a durable and ready-made basis for beginning to define coded peptides with rudimentary functional distinctions. If so, then before amino acid activation and acylation reactions became compartmentalized, bi-directional coding would have linked the two gene products, ensuring that both kinds of aaRS were produced in the same places and at the same times.

Bi-directional Coding Assures the Stability Necessary to Initiate and Sustain Genetic Diversity

More generally, the inversion symmetry of bi-directional aaRS gene coding fulfills a number of criteria necessary for the stability and survival of emerging quasispecies in the face of what have come to be called “error catastrophes” (Eigen and Schuster 1977; Koonin 2011; Orgel 1963; Eigen 1971). The relatively low fidelity of the Urzymes characterized thus far and the evidence that they themselves are highly evolved means that their origins lie in populations of molecules, called “quasispecies” that achieve similar functions, but have multiple sequences (Eigen et al. 1988). The centroids of quasispecies are powerful “attractors” because they are sufficiently isolated in sequence space that variants with lower function will eventually be eliminated unless they “revert” toward the centroid. It is difficult therefore for a quasispecies to “bifurcate” , because clusters of functional sequences are separated by large regions of inactive species. The most important barrier to generating multiple aaRS was therefore to establish two species with similar catalytic function that were sufficiently differentiated that they could form stable, independent quasispecies [See Fig. 2 of (Carter and Wills 2017). Figure 18 summarizes how bi-directioinal coding provides the requisite differentiation to establish a “boot block” for the self-organization of genetic coding (Carter and Wills 2017; Wills and Carter 2017) in a peptide RNA collaboration.

The Genetic Code Is Much too Unusual to Have Been Discovered by Chance

Among the challenges associated with genetic coding is to understand how so much amino acid physical chemistry became embedded into both tRNA and mRNA sequences. The combinatorics of genetic coding have been analyzed multiple ways by multiple investigators. The conclusion of these studies has been that for every code with the properties of the universal genetic code, there are perhaps a million other potential codes that are less optimal (Freeland and Hurst 1998). That estimate should probably be revised as both the apparent temporal appearance of encoded amino acid physical chemistry and bi-directional coding impose even more stringent requirements, making the universal code even more special than Freeland &Hurst appreciated.

In any RNA world, selecting amino acid sequences that fold into functional proteins depends entirely on natural selection. This implies an essentially trial-and-error search. As Koonin has pointedly noted by invoking multiple universes, such a specialized code is inaccessible by random processes in our universe (Koonin 2011). The alternative, which thus appears mandatory, is to provide a bootstrapping algorithm, by which a simple process can be endowed with the necessary characteristics to build complexity using its own resources.

The bootstrapping metaphor is quite intimately embedded into the framework of genetic coding. As noted (Carter and Wolfenden 2015, 2016), the code comprises a programming language and an associated set of programs written using that language. In this sense, it closely resembles a computer operating system, as Williams has noted elsewhere (Bowman et al. 2015; Petrov and Williams 2015; Anton et al. 2015). The key here is that computer operating systems are built around a simple set of instructions sufficient to enable the hardware, by executing those instructions, to build successively more sophisticated levels of functionality using a very limited set of alternatives. We believe that genetic coding must have arisen from an analogous “boot block” (Carter and Wills 2017), some of whose key relationships are illustrated in Fig. 19.

As shown in Fig. 19, bi-directionally coded protein aaRS are uniquely equipped to implement the crucial feedback loop necessary to bootstrap genetic coding, namely sensing the impacts of the local nano-environment on component amino acids that lead to protein folding rules. This feedback loop assures that amino acid sequences incapable of folding or whose functions are inferior are rapidly eliminated because the “rule executors” are themselves governed by the same phase transfer equilibria as all proteins. It cannot operate in any system using ribozymal aaRS. Bi-directional synthetase genes and the coding rules (Fig. 19) therefore together compose an existence proof that genetic coding could have evolved from humble origins by discovering both foldable sequences and optimal coding relationships much more rapidly than would have been possible in a pre-existing RNA world.

Interdependence Helped Assure Survival of Both Synthetase Classes

The bullet list at the beginning of this section differentiates the products of a bi-directional gene at all levels. This multi-level differentiation sustains their underlying functional separation. The probability that mutations could eventually fuse their functions is minimized by the decisive mechanistic disambiguation [see Fig. 2 of Ref. (Carter and Wills 2017)]. Further, dependence of the functions coded by each strand on the gene products of both strands defines a hypercycle-like coupling (Eigen and Schuster 1977) to ensure that the two gene products have enhanced ability to survive high error rates, as in Figs. 17 and 18. In this sense, the liability we see today in bi-directional coding—tight genetic linkage—was probably a significant strength before chemistry became localized in cells.

6.3 Catalysis Arose from Simple, Promiscuous Molten Globules

The progression of transition-state stabilization free energies illustrated in Fig. 4 already suggests that catalytic proficiency developed progressively during the evolutionary maturation of the aaRS. Protozymes with ~50 amino acid residues produce >40% and Urzymes with ~130 amino acid residues produce ~60% of the transition state stabilization of modern enzymes. The specificity spectra in Fig. 5 suggest that aaRS Urzymes had achieved only 20% of their contemporary specificity. That distinction between catalysis and specificity sharply delineates discrete events in their evolutionary history (Fig. 6). Thus, most of the specificity appears to have evolved after the synthetases had developed most of their catalytic proficiency. Preliminary published experiments suggest that amino acid specificity can be achieved only by invoking allosteric interactions between domains in the contemporary enzymes (Weinreb et al. 2014; Li and Carter 2013; Carter et al. 2017) (vide infra; Sect. 2.3).

The surprising catalytic proficiency of the aaRS protozymes suggests that the earliest transition-state stabilization mechanisms arose by positioning backbone binding determinants and only later made use of active-site side chains. A pertinent example is the N-terminal array of four unsatisfied hydrogen bond donors in alpha helices, which became a foundation for stabilizing phosphate—and pyrophosphate—binding (Hol et al. 1978) and which is an important part of the Class I protozyme. It is not as yet known whether or not specific active-site side chains in the protozymes function in the same manner as they appear to do in the Urzymes and full-length enzymes. The loss of activity in the active-site mutant protozymes suggests that these side chains do contribute to transition-state stabilization, but more detailed functional studies will be necessary to delineate precisely their functional role.

Similarly, it appears likely that the Urzyme level of evolutionary development may utilize tight transition-state binding associated with a large unfavorable negative entropy change as suggested for the TrpRS Urzyme (Sapienza et al. 2016) and a chorismate mutase molten globular variant (Hu 2014). This possibility, combined with the discussion in Sect. 3 of the superiority of peptide catalysts suggests strongly that the emergence and selection of catalytic activity itself was vastly more efficient and hence more rapid for peptide, than for ribozymal catalysts. Moreover, if tight transition-state binding is common among molten peptide molten globules, selection would recruit catalysts from a much larger manifold of sequences. The large negative TΔS term may ultimately limit the potential transition-state stabilization free energy of molten globules, and hence create in addition a selective advantage for evolving sequences that stabilize folded structures with more rigid transition-state complementarity (Amyes and Richard 2013). Finally, as noted above, enhancing substrate specificity likely required the emergence of allosteric effects that cannot develop within the Urzyme framework alone.

7 Outstanding Questions

Finally, it remains to outline areas that remain unresolved, where future experimental efforts can be most productive. We pose some of these questions in this section.

Do the Coding Relationships Identified in tRNA Acceptor Stems and Anticodons Preclude Evolutionary Inclusion of any Non-canonical Amino Acids into the Genetic Code?

This insightful question, posed by a reviewer, opens up the possibility that the selection of the canonical 20 amino acids might also have a physical basis if, for example, norleucine was excluded because it had an inappropriate combination of size and polarity. None of the possible tests of this idea appears to be feasible without comparable experimental measurements of the phase transfer free energies of the non-canonical amino acids. A vexing related question is how the distinction first emerged between tRNA acceptor stems recognized by ancestral Class I and II aaRS. Unpublished efforts to answer this question have not yet been fruitful.

What Was the Scale of Modularity in the Earliest Evolution of Proteins?

The most serious challenge to the various scenarios described in this review is the significant evidence from Koonin’s group (Koonin and Novozhilov 2009; Koonin 2011; Aravind et al. 2002; Leipe et al. 2002; Wolf et al. 1999) and others (Caetano-Anolles et al. 2007, 2013) that speciation of the Class I aaRS did not occur until relatively late in the generation of the proteome. Those studies are based on the most current thinking on phylogenetic reconstruction, yet they appear to be inconsistent with a fundamental role for objects like the bi-directional Protozyme gene products near the root of the proteome. Our view is that this conundrum arises from failure to appropriately recognize fine-scale modularity in constructing trees for domains. Thus, we believe that the putative bi-directional Protozyme gene was ancestral not only to the aaRSs but to many other families of related function, and that horizontal modular transfers may have been important before molecular biology created cellular species (Wolf et al. 1999; Soucy et al. 2015). Under this alternative hypothesis, the late branching of Class I aaRS represents a subsequent process associated with refinement of the aaRS specificities, once the proteome as a whole had emerged from a simpler alphabet.

Although the evidence implies that we can infer the modular outlines of evolutionary succession from the structural hierarchies, the assembly of contemporary enzymes from these reconstructed ancestral modules necessarily involved overwriting some details that therefore remain speculative. Our view is that the clearest guides to the actual ancestry remains experimental characterization of the functionality of modules clearly related by phylogenetic methods to contemporary proteins, and that the ability now to investigate the experimental recapitulation of intermodular interactions, both in cis and in trans, along lines developed with the TrpRS (Li and Carter 2013) and HisRS (Li et al. 2011) Urzymes remains a key direction for further research, including both the assembly of Urzymes from protozymes and the other modules present in the Urzymes (Pham et al. 2007) and the assembly of modern aaRS from Urzymes in the presence of homologs of CP1, the Class II insertion domains, and the anticodon-binding domains of both classes.

Recent work appears to be moving toward a consensus consistent with our view. In particular, Caetano-Anolles (Caetano-Anollés and Caetano-Anollés 2016) now recognizes that the P-loop hydrolase domain genes are among the more ancient genes, without acknowledging that the Class I protozyme appears to be a reasonable ancestral form, not only of the Class I synthetases, but also of the P-loop hydrolases themselves.

Was Specific Aminoacylation of tRNA Originally Catalyzed by Ribozymes?

The aaRS protozymes appear to be efficient catalysts of amino acid activation. It remains, among other things to be tested, to characterize their amino acid preferences. More important, however, is to determine whether or not they can also accelerate the transfer of activated amino acids to tRNA. The Class I protozyme lacks even the rudiments of a surface with which to bind the tRNA acceptor stem, whereas the Class II protozyme does retain such rudiments. Thus, it is possible that the ancestral protozymes were not entirely symmetrical in their catalytic repertoire, and that the Class II protozyme may have been uniquely able to acylate the tRNA acceptor stem. In any case, it appears that it is much more straight-forward to use Selex procedures to isolate RNA aptamers that use activated amino acids to acylate tRNA than it is to find comparable aptamers capable of activating amino acids by reaction with ATP (Niwa et al. 2009; Suga et al. 1998). Thus, it is both conceivable and worth testing whether or not if accompanied by protozymes, such aptamers might have accelerated tRNA acylation from amino acids and ATP.

How Simple an Amino Acid Alphabet Can Still Support an Active Bi-directional Protozyme Gene?

Answers to this question appear to be fundamental to understanding the origins of genetic coding. Fortunately, the wherewithal to answer it appears to be in place. The middle codon-base pairing metric of bi-directional coding appears to show sufficient variation between pairs of Class I and II aaRS Ur-genes (Figs. 9 and 10) to support a semi-quantitative map of bifurcations that best account for the order in which the amino acids were assimilated into the growing genetic code (Chandrasekaran et al. 2013), (Chandrasekaran, personal communication). The BEAST computer program has been adapted to utilize transition probability matrices of decreasing order, consistent with identifying nodes in the elaboration of the code (Markowitz et al. 2006), (Wills, personal communication). The multistate algorithm by which Rosetta imposes gene complementarity can be modified for testing models of code development by generating bi-directional protozyme genes whose catalytic activities can then be compared. These tools will become even more useful if and when it becomes possible to design a bona fide Ur-gene having all three of the modules identified originally by Pham et al. (2007).

How Did Cognate tRNAs Evolve to Distinguish Between the Two aaRS Classes?

This question appears to pose a much more difficult problem. The aaRS class distinction was straightforward to identify from the initial observation of a group of aaRSs that lacked the HIGH and KMSKS catalytic signatures characteristic of all aaRS sequences available at that time (Eriani et al. 1990a, b). Notwithstanding the evidence accumulated to date from a much larger tRNA research community, it is not yet possible to identify sequence signatures—outside the identity elements that define the amino acid specificities of the synthetases—that differentiate the recognition by one or the other synthetase Class. Thus, whereas it is possible in principle, even in the face of horizontal gene transfer (Ardell and Andersson 2006), to attempt to establish a tree along which the aaRS genes radiated to enlarge the amino acid alphabet, no such exercise appears possible yet for their cognate tRNAs. It is possible that the approaches used by Caetano-Anollès (Caetano-Anollés and Caetano-Anollés 2016) are capable of identifying appropriate patterns in the tRNA multiple sequence alignments, but thus far, that does not appear to have been a goal. Thus, a full understanding of the “tree” of amino acid acylation to tRNA in the emergence of translation, even in principle, appears still to lie ahead.

Abbreviations

aaRS:: aminoacyl-tRNA synthetase(s)
TrpRS:: tryptophanyl-tRNA synthetase
LeuRS:: Leucyl-tRNA synthetase
HisRS:: histidyl-tRNA synthetase
ATP:: adenosine 5′ triphosphate
PPi:: inorganic pyrophosphate
ASA:: Solvent-accessible surface area
HSQC:: Heteronuclear Single-Quantum Correlation
BEAST:: Bayesian Evolutionary Analysis Sampling Trees

References

Amyes TL, Richard JP (2013) Specificity in transition state binding: the Pauling model revisited. Biochemistry 52:2021–2035. doi:10.1021/bi301491r
Article CAS PubMed Google Scholar
Andreini C, Bertini I, Cavallaro G, Holliday GL, Thornton JM (2008) Metal ions in biological catalysis: from enzyme databases to general principles. J Biol Inorg Chem 13:1205–1218. doi:10.1007/s00775-008-0404-5
Article CAS PubMed Google Scholar
Anton SP, Gulen B, Norris AM, Kovacs NA, Bernier CR, Lanier KA, Fox GE, Harvey SC, Wartell RM, Hud NV, Williams LD (2015) History of the ribosome and the origin of translation. PNAS 112(50):15396–15401
Google Scholar
Aravind L, Leipe DD, Koonin EV (1998) Toprim—a conserved catalytic domain in type IA and II topoisomerases, DnaG-type primases, OLD family nucleases and RecR proteins. Nucleic Acids Res 26(18):4205–4213
Article CAS PubMed PubMed Central Google Scholar
Aravind L, Anantharaman V, Koonin EV (2002) Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implication for protein evolution in the RNA world. Proteins Struct Funct Genet 48:1–14
Article CAS PubMed Google Scholar
Ardell DH, Andersson SGE (2006) TFAM detects co-evolution of tRNA identity rules with lateral transfer of histidyl-tRNA synthetase. Nucleic Acids Res 34(3):893–904. doi:10.1093/nar/gkj449
Article CAS PubMed PubMed Central Google Scholar
Attwater J, Wochner A, Holliger P (2013, December) In-ice evolution of RNA polymerase ribozyme activity. Nat Chem 5:1101–1018. doi:10.1038/NCHEM.1781
Augustine J, Francklyn C (1997) Design of an active fragment of a class II aminoacyl-tRNA synthetase and its significance for synthetase evolution. Biochemistry 36:3473–3482
Article CAS PubMed Google Scholar
Baker D (2000, May 4) A surprising simplicity to protein folding. Nature 405:39–42
Google Scholar
Benner SA, Ellington AD, Tauer A (1989, September) Modern metabolism as a palimpsest of the RNA world. Proc Nati Acad Sci USA 86:7054–7058
Article CAS Google Scholar
Benner SA, Sassi SO, Gaucher EA (2007) Molecular paleoscience: systems biology from the past. Adv Enzymol Relat Areas Mol Biol 75:9–140
Google Scholar
Bernhardt HS (2012) The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others). Biol Direct 7:23
Article CAS PubMed PubMed Central Google Scholar
Bowman JC, Hud NV, Williams LD (2015) The ribosome challenge to the RNA world. J Mol Evol 80:143–161. doi:10.1007/s00239-015-9669-9
Article CAS PubMed Google Scholar
Breaker RR (2012) Riboswitches and the RNA world. Cold Spring Harb Perspect Biol 4:a003566. doi:10.1101/cshperspect.a003566
Article PubMed PubMed Central CAS Google Scholar
Bridgham JT, Carroll SM, Thornton JW (2006) Evolution of hormone-receptor complexity by molecular exploitation. Science 312(7):97–101
Google Scholar
Bridgham JT, Ortlund EA, Thornton JW (2009) An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461:515–519
Article CAS PubMed Google Scholar
Budiman M, Knaggs MH, Fetrow JS, Alexander RW (2007) Using molecular dynamics to map interaction networks in an aminoacyl-tRNA synthetase. Prot Struct Funct Bioinf 68:670–689
Article CAS Google Scholar
Buehner M, Ford GC, Moras D, Olsen KW, Rossmann MG (1973) D-Glyceraldehyde 3-phosphate Dehydrogenase: three dimensional structure and evolutionary significance. Proc Nat Acad Sci USA 70:3052–3054
Article CAS PubMed PubMed Central Google Scholar
Burbaum J, Schimmel P (1991) Structural relationships and the classification of aminoacyl-tRNA synthetases. J Biol Chem 266(26):16965–16968
CAS PubMed Google Scholar
Caetano-Anollés G (2015) Ancestral insertions and expansions of rRNA do not support an origin of the ribosome in its peptidyl transferase center. J Mol Evol 80:162–165. doi:10.1007/s00239-015-9677-9
Article PubMed PubMed Central CAS Google Scholar
Caetano-Anollés D, Caetano-Anollés G (2016) Piecemeal buildup of the genetic code, ribosomes, and genomes from primordial tRNA building blocks. Lifestyles 6:43. doi:10.3390/life6040043
Google Scholar
Caetano-Anolles G, Kim HS, Mittenthal JE (2007) The origin of modern metabolic networks inferred from phylogenomic analysis of protein architecture. Proc Nat Acad Sci USA 104(22):9358–9363. doi:10.1073 pnas.0701214104
Google Scholar
Caetano-Anollés G, Wang M, Caetano-Anollés D (2013) Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 8(8):e72225. doi:10.1371/journal.pone.0072225
Article PubMed PubMed Central CAS Google Scholar
Cammer S, Carter CW Jr (2010) Six rossmannoid folds, including the class I aminoacyl-tRNA synthetases, share a partial core with the anticodon-binding domain of a class II aminoacyl-tRNA synthetase. Bioinformatics 26(6):709–714. doi:10.1093/bioinformatics/btq039
Article CAS PubMed PubMed Central Google Scholar
Carter CW Jr (1975, March) Cradles for molecular evolution. New Sci 27:784–787
Google Scholar
Carter CW Jr (2014) Urzymology: experimental access to a key transition in the appearance of enzymes. J Biol Chem 289(44):30213–30220. doi:10.1047/jbcR114.576495
Article CAS PubMed PubMed Central Google Scholar
Carter CW Jr (2015) What RNA world? Why a peptide/RNA partnership merits renewed experimental attention. Lifestyles 5:294–320. doi:10.3390/life5010294
Google Scholar
Carter CW Jr (2017) High-dimensional mutant and modular thermodynamic cycles, molecular switching, and free energy transduction. Annu Rev Biophys 46:433–453. doi:10.1146/annurev-biophys-070816-033811
Article CAS PubMed PubMed Central Google Scholar
Carter CW Jr, Duax WL (2002) Did tRNA synthetase classes arise on opposite strands of the same gene? Mol Cell 10:705–708
Article CAS PubMed Google Scholar
Carter CW Jr, Kraut J (1974) A proposed model for interaction of polypeptides with RNA. Proc Nat Acad Sci USA 71(2):283–287
Article CAS PubMed PubMed Central Google Scholar
Carter CW Jr, Wills PR (2017) Interdependence, reflexivity, fidelity, and impedance matching: the need for an alternative to the RNA world. BioRxiv. doi:10.1101/139139
Carter CW, Jr., Wolfenden R (2015) tRNA acceptor-stem and anticodon bases form independent codes related to protein folding. Proc Nat Acad Sci USA 112(24):7489–7494. doi:http://www.pnas.org/cgi/doi/10.1073/pnas.1507569112
Carter CW Jr, Wolfenden R (2016) Acceptor-stem and anticodon bases embed amino acid chemistry into tRNA. RNA Biol 13(2):145–151. doi:10.1080/15476286.2015.1112488
Article PubMed Google Scholar
Carter CW Jr, Li L, Weinreb V, Collier M, Gonzales-Rivera K, Jimenez-Rodriguez M, Erdogan O, Chandrasekharan SN (2014) The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: an unlikely scenario for the origins of translation that will not be dismissed. Biol Direct 9:11
Article PubMed PubMed Central CAS Google Scholar
Carter J, Charles W., Chandrasekaran SN, Weinreb V, Li L, Williams T 2016 Combining multi-mutant and modular thermodynamic cycles to measure energetic coupling networks in enzyme catalysis. In: Pearson A, Benedict J (eds) Structural dynamics, American Crystallographic Association annual meeting, 2016. American Crystallographic Association
Google Scholar
Carter CW Jr, Chandrasekaran SN, Weinreb V, Li L, Williams T (2017) Combining multi-mutant and modular thermodynamic cycles to measure energetic coupling networks in enzyme catalysis. Struct Dyn 4:032101
Google Scholar
Chandrasekaran SN, Carter CWJ (2017) Adding torsional interaction terms to the anisotropic network model improves the PATH performance, enabling detailed comparison with experimental rate data. Struct Dyn 4:032103
Google Scholar
Chandrasekaran SN, Yardimci G, Erdogan O, Roach JM, Carter CW, Jr (2013) Statistical evaluation of the Rodin-Ohno hypothesis: sense/antisense coding of ancestral class I and II aminoacyl-tRNA synthetases. Mol Biol Evol 30(7):1588–1604. doi:10.1093/molbev/mst070
Chandrasekaran SN, Das J, Dokholyan NV, Carter CW Jr (2016) A modified PATH algorithm rapidly generates transition states comparable to those found by other well established algorithms. Struct Dyn 3:012101. doi:10.1063/1.4941599
Chuang W-J, Abeygunawardana C, Pedersen PL, Mildvan AS (1992a) Two-dimensional NMR, circular dichroism, and fluorescence studies of PP-50, a synthetic ATP-binding peptide from the b-subunit of mitochondrial ATP synthase. Biochemist 31:7915–7921
Article CAS Google Scholar
Chuang W-J, Abeygunawardana C, Gittis AG, Pedersen PL, Mildvan AS (1992b) Solution structure and function in trifluoroethanol of PP-50, an ATP-binding peptide from F₁ATPase. Arch Biochem Biophys 319(1 May 10):110–122
Google Scholar
Crick FHC (1966) Codon-anticodon pairing: the wobble hypothesis. J Mol Biol 19:548–555
Article CAS PubMed Google Scholar
Cusack S (1993) Sequence, structure and evolutionary relationships between class 2 aminoacyl-tRNA synthetases: an update. Biochimie 75:1077–1081
Article CAS PubMed Google Scholar
Cusack S (1994) Evolutionary implications. Nat Struct Mol Biol 1:760
Article Google Scholar
Cusack S (1995) Eleven down and nine to go. Nat Struct Biol 2:824–831
Article CAS PubMed Google Scholar
Cusack S, Berthet-Colominas C, Härtlein M, Nassar N, Leberman R (1990) A second class of synthetase structure revealed by X-ray analysis of Escherichia coli seryl-tRNA synthetase at 2.5 Å. Nature 347(6290):249–255
Article CAS PubMed Google Scholar
Cusack S, Härtlein M, Leberman R (1991) Sequence, structural and evolutionary relationships between class 2 aminoacyl-tRNA synthetases. Nucl Acid Res 109(13):3489–3498
Google Scholar
Danchin A (2007) Archives or palimpsests? Bacterial genomes unveil a scenario for the origin of life. Biol Theor 2(1):1–10
Article Google Scholar
Dantas G, Kuhlman B, Callender D, Wong M, Baker D (2003) A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 332(2):449–460
Article CAS PubMed Google Scholar
Dean AM, Thornton JW (2007) Mechanistic approaches to the study of evolution: the functional synthesis. Nat Rev Genet 8(September):675
Article CAS PubMed PubMed Central Google Scholar
Delarue M (2007) An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13:1–9
Google Scholar
Delarue M, Moras D (1992) Aminoacyl-tRNA synthetases: partition into two classes. In: Eckstein F, Lilley DMJ (eds) Nucleic acids and molecular biology, vol 6. Springer, Berlin/Heidelberg, pp 203–224
Chapter Google Scholar
Dennett DC (1995) Darwin’s dangerous idea: evolution and the meanings of life. Simon and Schuster, New York
Google Scholar
Dill KA, MacCallum JL (2012) The protein-folding problem, 50 years on. Science 338:1042–1046
Article CAS PubMed Google Scholar
Duax WL, Huether R, Pletnev V, Langs D, Addlagatta A, Connare S, Habegger L, Gill J (2005) Rational genomes: antisense open reading frames and codon bias in short chain oxido reductase enzymes and the evolution of the genetic code. Prot Struct Funct Bioinf 61:900–906
Article CAS Google Scholar
Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58:465–523
Article CAS PubMed Google Scholar
Eigen M, Schuster P (1977) The hypercyde: a principle of natural self-organization part a: emergence of the hypercycle. Naturwissenschaften 64:541–565
Article CAS PubMed Google Scholar
Eigen M, McCaskill JS, Schuster P (1988) Molecular quasi-species. J Phys Chem 92:6881–6891
Article CAS Google Scholar
Eriani G, Delarue M, Poch O, Gangloff J, Moras D (1990a) Partition of tRNA synthetases into two classed based on mutually exclusive sets of sequence motifs. Nature 347(9):203–206
Article CAS PubMed Google Scholar
Eriani G, Dirheimer G, Gangloff J (1990b) Aspartyl-tRNA synthetase from Escherichia coli: cloning and characterisation of the gene, homologies of its translated amino acid sequence with asparaginyl- and lysyl-tRNA synthetases. Nucleic Acids Res 18:7109–7117
Article CAS PubMed PubMed Central Google Scholar
Farrow MA, Nordin BE, Schimmel P (1999) Nucleotide determinants for tRNA-dependent amino acid discrimination by a Class I tRNA synthetase. Biochemistry 38:16898–16903
Google Scholar
Fersht AR (1999) Structure and mechanism in protein science. W. H. Freeman and Company, New York
Google Scholar
Fersht AR (2000) Transition-state structure as a unifying basis in protein-folding mechanisms: contact order, chain topology, stability, and the extended nucleus mechanism. Proc Nat Acad Sci USA 97(4):1525–1529
Google Scholar
Fersht AR, Ashford JS, Bruton CJ, Jakes R, Koch GLE, Hartley BS (1975) Active site titration and aminoacyl adenylate binding stoichiometry of amionacyl-tRNA synthetases. Biochemist 14(1):1–4
Article CAS Google Scholar
Fournier GP, Alm EJ (2015) Ancestral reconstruction of a pre-LUCA aminoacyl-tRNA synthetase ancestor supports the late addition of Trp to the genetic code. J Mol Evol 80:171–185. doi:10.1007/s00239-015-9672-1
Article CAS PubMed Google Scholar
Fournier GP, Andam CP, Alm EJ, Gogarten JP (2011) Molecular evolution of aminoacyl tRNA synthetase proteins in the early history of life. Orig Life Evol Biosph 41:621–632
Article CAS PubMed Google Scholar
Francklyn C, Schimmel P (1989, February 2) Aminoacylation of RNA Minihelices with Alanine. Nature 337:478–481
Google Scholar
Francklyn C, Schimmel P (1990, November) Enzymatic aminoacylation of an eight-base-pair microhelix with histidine. Proc Nati Acad Sci USA 87:8655–8659
Article CAS Google Scholar
Francklyn C, Musier-Forsyth K, Schimmel P (1992) Small RNA helices as substrates for aminoacylation and their relationship to charging of transfer RNAs. Eur J Biochem 206:315–321
Article CAS PubMed Google Scholar
Francklyn CS, First EA, Perona JJ, Hou Y-M (2008) Methods for kinetic and thermodynamic analysis of aminoacyl-tRNA synthetases. Methods 44:100–118
Article CAS PubMed PubMed Central Google Scholar
Freeland SJ, Hurst LD (1998) The genetic code is one in a million. J Mol Evol 47:238–248
Article CAS PubMed Google Scholar
Fry DC, Kuby SA, Mildvan AS (1985) NMR studies of the MgATP binding site of adenylate kinase and of a 45-residue peptide fragment of the enzyme. Biochemistry 24:4680–4694
Article CAS PubMed Google Scholar
Fry DC, Byler DM, Sisu H, Brown EM, Kuby SA, Mildvan AS (1988) Solution structure of the 45-residue MgATP-binding peptide of adenylate kinase as examined by 2-D NMR, FTIR, and CD spectroscopy. Biochemistry 27:3588–3598
Article CAS PubMed Google Scholar
Füchslin RM, McCaskill JS (2001) Evolutionary self-organization of cell-free genetic coding. Proc Natl Acad Sci USA 98:9185–9190
Google Scholar
Gaucher EA, Govindarajan S, Ganesh OK (2008) Palaeotemperature trend for precambrian life inferred from resurrected proteins. Nature 451(Feb 7):704–707
Article CAS PubMed Google Scholar
Gibbs PR, Radzicka A, Wolfenden R (1991) The anomalous hydrophilic character of proline. J Am Chem Soc 113:4714–4715
Article CAS Google Scholar
Giegé R, Sissler M, Florentz C (1998) Universal rules and idiosyncratic features in tRNA identity. Nucleic Acids Res 26(22):5017–5035
Article PubMed PubMed Central Google Scholar
Gilbert W (1986) The RNA world. Nature 319:618
Article Google Scholar
Guo M, Schimmel P (2013, March) Essential nontranslational functions of tRNA synthetases. Nat Chem Biol 9:145–153
Google Scholar
Guo M, Yang X-L, Schimmel P (2010, September) New functions of aminoacyl-tRNA synthetases beyond translation. Nat Rev Mol Cell Biol 11:668–674
Article CAS PubMed PubMed Central Google Scholar
Hanson-Smith V, Kolaczkowski B, Thornton JW (2010) Robustness of ancestral sequence reconstruction to phylogenetic uncertainty. Mol Biol Evol 27(9):1988–1999
Article CAS PubMed PubMed Central Google Scholar
Harish A, Caetano-Anollés G (2012) Ribosomal history reveals origins of modern protein synthesis. PLoS One 7(3):e32776. doi:10.1371/journal.pone.0032776
Article CAS PubMed PubMed Central Google Scholar
Härtlein M, Cusack S (1995) Structure, function and evolution of Seryl-tRNA synthetases: implications for the evolution of aminoacyl-tRNA synthetases and the genetic code. J Mol Evol 40:519–530
Article PubMed Google Scholar
Henderson BS, Schimmel P (1997) RNA-RNA interactions between oligonucleotide substrates for aminoacylation. Bioorg Med Chem 5(6):1071–1079
Article CAS PubMed Google Scholar
Hofstadter DR (1979) Gödel, Escher, Bach: an eternal golden braid. Basic Books, Inc, New York
Google Scholar
Hol WJG, van Duijnen PT, Berensen HJC (1978) The α-helix dipole and the properties of proteins. Nature 273:443–446
Article CAS PubMed Google Scholar
Horning DP, Joyce GF (2016) Amplification of RNA by an RNA polymerase ribozyme. Proc Nat Acad Sci USA 113(35):9786–9791
Article CAS PubMed PubMed Central Google Scholar
Hu H (2014) Wild-type and molten globular chorismate mutase achieve comparable catalytic rates using very different enthalpy/entropy compensations. Sci China 57(1):156–164. doi:10.1007/s11426-013-5021-7
Ibba M, Soll D (2004) Aminoacyl-tRNAs: setting the limits of the genetic code. Genes Dev 18:731–738
Article CAS PubMed Google Scholar
Ibba M, Francklyn C, Cusack S (2005) Aminoacyl-tRNA synthetases. MBIU, Landesbioscience, Georgetown
Google Scholar
Johnson BR, Lam SK (2010) Self-organization, natural selection, and evolution: cellular hardwareand genetic software. Bioscience 60:879–885. doi:10.1525/bio.2010.60.11.4
Kamtekar S, Schiffer JM, Xiong H, Babik JM, Hecht MH (1993) Protein design by binary patterning of polar and non-polar amino acids. Science 262:1680–1685
Article CAS PubMed Google Scholar
Kapustina M, Carter CW Jr (2006) Computational studies of tryptophanyl-tRNA synthetase ligand binding and conformational stability. J Mol Biol 362:1159–1180
Article CAS PubMed Google Scholar
Kapustina M, Hermans J, Carter CW Jr (2006) Potential of mean force estimation of the relative magnitude of the effect of errors in molecular mechanics approximations. J Mol Biol 362:1177–1180
Article CAS Google Scholar
Kapustina M, Weinreb V, Li L, Kuhlman B, Carter CW Jr (2007) A conformational transition state accompanies tryptophan activation by B. stearothermphilus tryptophanyl-tRNA synthetase. Structure 15:1272–1284
Article CAS PubMed PubMed Central Google Scholar
Kirby AJ, Younas M (1970) The reactivity of phosphate esters. Reactions of diesters with nucleophiles. J Chem Soc B 418:1165–1172
Article Google Scholar
Klipcan L, Safro M (2004) Amino acid biogenesis, evolution of the genetic code and aminoacyl-tRNA synthetases. J Theor Biol 228:389–396
Article CAS PubMed Google Scholar
Koonin EV (2011) The logic of chance: the nature and origin of biological evolution. Pearson Education/FT Press Science, Upper Saddle River
Google Scholar
Koonin EV, Novozhilov AS (2009) Origin and evolution of the genetic code: the Universal Enigma. IUBMB Life 61(2):99–111. doi:10.1002/iub.146
Küippers B (1979) Towards an experimental analysis of molecular self-organization and precellular Darwinian evolution. Naturwissenschaften 66:228–243
Article Google Scholar
Leaver-Fay A, Jacak R, Stranges PB, Kuhlman B (2011) A generic program for multistate protein design. PLoS One 6(7):e20937
Article CAS PubMed PubMed Central Google Scholar
Leipe DD, Wolf YI, Koonin EV, Aravind L (2002) Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol 317:41–72
Article CAS PubMed Google Scholar
LéJohn HB, Cameron LE, Yang B, MacBeath G, Barker DS, Willams SA (1994a, February 11) Cloning and analysis of a constitutive heat shock (cognate) protein 70 gene inducible by L-glutamine. J Biol Chem 269:4513–4522
PubMed Google Scholar
LéJohn HB, Cameron LE, Yang B, Rennie SL (1994b, February 11) Molecular characterization of an NAD-specific glutamate dehydrogenase gene inducible by L-glutamine: antisense gene pair arrangement with L-glutamine-inducible heat shock 70-like protein gene. J Biol Chem 269:4523–4531
PubMed Google Scholar
Li L, Carter CW, Jr (2013) Full implementation of the genetic code by tryptophanyl-tRNA synthetase requires intermodular coupling. J biol Chem 288:34736–34745. doi:10.1074/jbc.M113.510958
Li L, Weinreb V, Francklyn C, Carter CW Jr (2011) Histidyl-tRNA synthetase urzymes: class I and II aminoacyl-tRNA synthetase urzymes have comparable catalytic activities for cognate amino acid activation. J Biol Chem 286:10387–10395. doi:10.1074/jbc.M110.198929
Article CAS PubMed PubMed Central Google Scholar
Li L, Francklyn C, Carter CW Jr (2013) Aminoacylating urzymes challenge the RNA world hypothesis. J Biol Chem 288:26856–26863. doi:10.1074/jbc.M113.496125
Article CAS PubMed PubMed Central Google Scholar
Li R, Macnamara LM, Leuchter JD, Alexander RW, Cho SS (2015) MD simulations of tRNA and aminoacyl-tRNA synthetases: dynamics, folding, binding, and allostery. Int J Mol Sci 16:15872–15902. doi:10.3390/ijms160715872
Article CAS PubMed PubMed Central Google Scholar
Linderstrøm-Lang KU (1952) The lane medical lectures. Stanford University Press, Stanford
Google Scholar
Markowitz S, Drummond A, Nieselt K, Wills PR (2006) Simulation model of prebiotic evolution of genetic coding. In: Rocha LM, Yaeger LS, Bedau MA, Floreano D, Goldstone RL, Vespignani A (eds) Artificial Life, vol 10. MIT Press, Cambridge, MA, pp 152–157
Google Scholar
Martinez L, Jimenez-Rodriguez M, Gonzalez-Rivera K, Williams T, Li L, Weinreb V, Niranj Chandrasekaran S, Collier M, Ambroggio X, Kuhlman B, Erdogan O, Carter CWJ (2015) Functional class I and II amino acid activating enzymes can be coded by opposite strands of the same gene. J Biol Chem 290(32):19710–19725. doi:10.1074/jbc.M115.642876
Article CAS Google Scholar
Moelbert S, Emberly E, Tang C (2004) Correlation between sequence hydrophobicity and surface-exposure pattern of database proteins. Protein Sci 13:752–762
Article CAS PubMed PubMed Central Google Scholar
Moffet DA, Foley J, Hecht MH (2003) Midpoint reduction potentials and heme binding stoichiometries of de novo proteins from designed combinatorial libraries. Biophys Chem 105:231–239
Article CAS PubMed Google Scholar
Mullen GP, Vaughn JB, Jr., Mildvan AS (1993) Sequential proton NMR resonance assignments, circular dichroism, and structural properties of a 50-residue substrate-binding peptide from DNA polymerase I. Arch Biochem Biophys 301(1 February 15):174–183
Google Scholar
Niwa N, Yamagishi Y, Murakami H, Suga H (2009) A flexizyme that selectively charges amino acids activated by a water-friendly leaving group. Bioorg Med Chem Lett 19:3892–3894
Article CAS PubMed Google Scholar
Noller H (2004) The driving force for molecular evolution of translation. RNA 10:1833–1837
Article CAS PubMed PubMed Central Google Scholar
Noller HF, Hoffarth V, Zimniak L (1992, June 5) Unusual resistance of peptidyl transferase to protein extraction procedures. Science 256:1416–1419
Google Scholar
O’Donoghue P, Luthey-Schulten Z (2003) On the evolution of structure in aminoacyl-tRNA synthetases. Microbiol Mol Biol Rev 67(4):550–573
Article PubMed PubMed Central CAS Google Scholar
Orgel LE (1963) The maintenance of the accuracy of protein synthesis and its relevance to ageing. Proc Nat Acad Sci USA 49:517–521
Google Scholar
Ortlund EA, Bridgham JT, Redinbo MR, Thornton JW (2007) Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317:1544–1548
Article CAS PubMed PubMed Central Google Scholar
Patel SC, Bradley LH, Jinadasa SP, Hecht MH (2009) Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4-helix bundle proteins. Protein Sci 18:1388–1400
Article CAS PubMed PubMed Central Google Scholar
Perona JJ, Gruic-Sovulj I (2013) Synthetic and editing mechanisms of aminoacyl-tRNA synthetases. Top Curr Chem. doi:10.1007/128_2013_456
Pervushin K, Vamvaca K, Vogeli B, Hilvert D (2007) Structure and dynamics of a molten globular enzyme. Nat Struct Mol Biol 14(December):1202–1206
Article CAS PubMed Google Scholar
Petrov AS, Williams LD (2015) The ancient heart of the ribosomal large subunit: a response to Caetano-Anolles. J Mol Evol 80:166–170. doi:10.1007/s00239-015-9678-8
Article CAS PubMed Google Scholar
Petrov AS, Bernier CR, Hsiao C, Norris AM, Kovacs NA, Waterbury CC, Stepanov VG, Harvey SC, Fox GE, Wartell RM, Hud NV, Williams LD (2014) Evolution of the ribosome at atomic resolution. Proc Nat Acad Sci USA 111(28):10251–10256
Google Scholar
Pham Y, Li L, Kim A, Erdogan O, Weinreb V, Butterfoss G, Kuhlman B, Carter CW Jr (2007) A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and II aminoacyl-tRNA synthetases. Mol Cell 25:851–862
Article CAS PubMed Google Scholar
Pham Y, Kuhlman B, Butterfoss GL, Hu H, Weinreb V, Carter CW Jr (2010) Tryptophanyl-tRNA synthetase urzyme: a model to recapitulate molecular evolution and investigate intramolecular complementation. J Biol Chem 285:38590–38601. doi:10.1074/jbc.M110.136911
Article CAS PubMed PubMed Central Google Scholar
Radzicka A, Wolfenden R (1988) Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-0ctanol, and neutral aqueous solution. Biochemistry 27(5):1664–1670
Article CAS Google Scholar
Ribas de Pouplana L, Schimmel P (2001a) Two classes of tRNA synthetases suggested by sterically compatible dockings on tRNA acceptor stem. Cell 104:191–193
Article CAS PubMed Google Scholar
Ribas de Pouplana L, Schimmel P (2001b) Operational RNA code for amino acids in relation to genetic code in evolution. J Biol Chem 276:6881–6884
Article CAS PubMed Google Scholar
Ribas de Pouplana L, Schimmel P (2001c) Aminoacyl-tRNA synthetases: potential markers of genetic code development. TIBS 26(10):591–596
CAS PubMed Google Scholar
Robertson MP, Joyce GF (2012) The origins of the RNA world. Cold Spring Harb Perspect Biol 4:a003608. doi:10.1101/cshperspect.a003608
Article PubMed PubMed Central CAS Google Scholar
Rodin SN, Ohno S (1995) Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig Life Evol Biosph 25:565–589
Article CAS PubMed Google Scholar
Rodin SN, Rodin A (2006a) Partitioning of aminoacyl-tRNA synthetases in two classes could have been encoded in a strand-symmetric RNA world. DNA Cell Biol 25:617–626
Article CAS PubMed Google Scholar
Rodin SN, Rodin A (2006b) Origin of the genetic code: first aminoacyl-tRNA synthetases could replace isofunctional ribozymes when only the second base of codons was established. DNA Cell Biol 25:365–375
Article CAS PubMed Google Scholar
Rodin SN, Rodin AS (2008) On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100:341–355
Article CAS PubMed Google Scholar
Rodin A, Rodin SN, Carter CW Jr (2009) On primordial sense-antisense coding. J Mol Evol 69:555–567
Article CAS PubMed PubMed Central Google Scholar
Rodin AS, Szathmáry E, Rodin SN (2011) On origin of genetic code and tRNA before translation. Biol Direct 6:14
Article CAS PubMed PubMed Central Google Scholar
Safro M, Klipcan L (2013) The mechanistic and evolutionary aspects of the 2′- and 3′-OH paradigm in biosynthetic machinery. Biol Direct 8:17
Article CAS PubMed PubMed Central Google Scholar
Sapienza PJ, Li L, Williams T, Lee AL, Carter CW Jr (2016) An ancestral tryptophanyl-tRNA synthetase precursor achieves high catalytic rate enhancement without ordered ground-state tertiary structures. ACS Chem Biol 11:1661–1668. doi:10.1021/acschembio.5b01011
Article CAS PubMed PubMed Central Google Scholar
SAS (2015) JMP: the statistical discovery software, 10 edn. SAS Institute, Cary
Google Scholar
Schimmel P (1991) Classes of aminoacyl-tRNA synthetases and the establishment of the genetic code. Trend Biol Sci 16(1):1–3
Article CAS Google Scholar
Schimmel P (1996) Origin of genetic code: a needle in the haystack of tRNA sequences. Proc Nat Acad Sci USA 93:4521–4522
Article CAS PubMed PubMed Central Google Scholar
Schimmel P, Ribas de Pouplana L (2000) Footprints of aminoacyl-tRNA synthetases are everywhere. TIBS 25(5):207–209
CAS PubMed Google Scholar
Schimmel P, Giegé R, Moras D, Yokoyama S (1993) An operational RNA code for amino acids and possible relationship to genetic code. Proc Nat Acad Sci USA 90:8763–8768
Article CAS PubMed PubMed Central Google Scholar
Schroeder GK, Wolfenden R (2007) The rate enhancement produced by the ribosome: an improved model. Biochemist 46:4037–4044
Article CAS Google Scholar
Sczepanski JT, Joyce GF (2014) A cross-chiral RNA polymerase ribozyme. Nature 515:440–442. doi:10.1038/nature13900
Shepherd J, Ibba M (2014) Relaxed substrate specificity leads to extensive tRNA mischarging by streptococcus pneumoniae class I and class II aminoacyl-tRNA synthetases. mBio 5 (5):e01656–e01614. doi:10.1128/mBio.01656-14
Sherlin LD, Perona JJ (2003, May) tRNA-dependent active site assembly in a class I aminoacyl-tRNA synthetase. Structure 11:591–603. doi:10.1016/S0969-2126(03)00074-1
Article CAS PubMed Google Scholar
Sievers A, Beringer M, Rodnina MV, Wolfenden R (2004) The ribosome as an entropy trap. Proc Nat Acad Sci USA 101:7897–7901
Article CAS PubMed PubMed Central Google Scholar
Silvert M, Simonson T (2016) Creation and analysis of an algorithm creating overlapping genes. Laboratoire de Biochimie – École Polytechnique, Palaiseau
Google Scholar
Smith TF, Hartman H (2015) The evolution of class II aminoacyl-tRNA synthetases and the first code. FEBS Lett 589(23):3499–3507
Google Scholar
Soucy SM, Huang J, Gogarten JP (2015, August) Horizontal gene transfer: building the web of life. Nat Rev Gen 16:472
Google Scholar
Stackhouse J, Presnell SR, McGeehan GM, Nambiar KP, Benner SA (1990) The ribonuclease from an extinct bovid ruminant. FEBS Lett 262(1):104–106
Article CAS PubMed Google Scholar
Steitz TA, Steitz JA (1993) A general two-metal-ion mechanism for catalytic RNA. Proc Natl Acad Sci U S A 90(July):6498–6502
Article CAS PubMed PubMed Central Google Scholar
Suga H, Lohse PA, Szostak JW (1998) Structural and kinetic characterization of an acyl transferase ribozyme. J Am Chem Soc 120:1151–1156
Article CAS PubMed Google Scholar
Sun F-J, Caetano-Anollés G (2008) Evolutionary patterns in the sequence and structure of transfer RNA: a window into early translation and the genetic code. Plos One 3 (7):e2799
Google Scholar
Taylor AI, Pinheiro VB, Smola MJ, Morgunov AS, Peak-Chew S, Cozens C, Weeks KM, Herdewijn P, Holliger P (2015) Catalysts from synthetic genetic polymers. Nature 518:427–430
Google Scholar
Thornton JW (2004, May) Resurrecting ancient genes: experimental analysis of extinct molecules. Nat Rev Genet 5:366–375
Google Scholar
Thornton JW, Need E, Crews D (2003) Resurrecting the ancestral steroid receptor: ancient origin of estrogen signaling. Science 301:714–1717
Article CAS Google Scholar
Tuerck C, Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T7 DNA polymerase. Science 249:505–510
Article Google Scholar
Uter NT, Gruic-Sovulj I, Perona JJ (2005) Amino acid-dependent transfer RNA affinity in a class I aminoacyl-tRNA synthetase. J Biol Chem 280(25):23966–23977. doi:10.1074/jbc.M414259200
Van Noorden R (2009, May 13) RNA world easier to make. Nature published online. doi:10.1038/news.2009.471
Vestigian K, Woese CR, Goldenfeld N (2006) Collective evolution and the genetic code. Proc Nat Acad Sci USA 103:10696–10701
Article CAS Google Scholar
Weinreb V, Carter CW Jr (2008) Mg²⁺-free B. stearothermophilus tryptophanyl-tRNA synthetase activates tryptophan with a major fraction of the overall rate enhancement. J Am Chem Soc 130:1488–1494
Article CAS PubMed PubMed Central Google Scholar
Weinreb V, Li L, Kaguni LS, Campbell CL, Carter CW Jr (2009, July 15) Mg2+−assisted catalysis by B. stearothermophilus TrpRS is promoted by allosteric effects. Structure 17:1–13
Google Scholar
Weinreb V, Li L, Carter CW, Jr. (2012) A master switch couples Mg²⁺-assisted catalysis to domain motion in B. stearothermophilus tryptophanyl-tRNA synthetase. Structure 20
Google Scholar
Weinreb V, Li L, Chandrasekaran SN, Koehl P, Delarue M, Carter CW Jr (2014) Enhanced amino acid selection in fully-evolved tryptophanyl-tRNA synthetase, relative to its urzyme, requires domain movement sensed by the D1 switch, a remote. Dynamic Packing Motif J Biol Chem 289:4367–4376. doi:10.1074/jbc.M113.538660
CAS PubMed Google Scholar
Wills PR (1993) Self-organization of genetic coding. J Theor Biol 162:267–287
Article CAS PubMed Google Scholar
Wills PR (2004) Stepwise evolution of molecular biological coding. In: Pollack J, Bedau M, Husbands P, Ikegami T, Watson RA (eds) Artificial life IX. MIT Press, Cambridge, pp 51–56
Google Scholar
Wills PR (2016) The generation of meaningful information in molecular systems. Phil Trans R Soc A A374:20150016. doi:10.1098/rsta.20150066
Google Scholar
Wills PR, Carter CW Jr (2017) Insuperable problems of an initial genetic code emerging from an RNA world. BioRxiv. doi:10.1101/140657
Wills PR, Nieselt K, McCaskill JS (2015) Emergence of coding and its specificity as a physico-informatic problem. Orig Life Evol Biosph published online; pagination not yet available. doi:10.1007/s11084-015-9434-5
Wochner A, Attwater J, Coulson A, Holliger P (2011, April 8) Ribozyme-catalyzed transcription of an active ribozyme. Science 332:209–212
Google Scholar
Woese C (1967) The genetic code. Harper & Row, New York
Google Scholar
Woese C (1969) Models for the evolution of codon assignments. J Mol Biol 43:235–240
Article CAS PubMed Google Scholar
Woese CR, Dugre DH, Saxinger WC, Dugre SA (1966) The molecular basis for the genetic code. Proc Natl Acad Sci U S A 55:966–974
Article CAS PubMed PubMed Central Google Scholar
Woese CR, Olsen GJ, Ibba M, Soll D (2000) Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol Mol Biol Rev 64(1):202–236
Article CAS PubMed PubMed Central Google Scholar
Wolf YI, Koonin EV (2007) On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization. Biol Direct 2:14
Article PubMed PubMed Central CAS Google Scholar
Wolf YI, Aravind L, Grishin NV, Koonin EV (1999) Evolution of aminoacyl-tRNA synthetases—analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res 9:689–710
CAS PubMed Google Scholar
Wolfenden R (2007, May 5) Experimental measures of amino acid Hydrophobicity and the topology of Transmembrane and globular proteins. J Gen Physiol 129:357–362. doi:10.1085/jgp.200709743
Wolfenden R, Liang Y-L (1989) Contributions of solvent water to biological group-transfer potentials: mixed anyhydrides of phosphoric and carboxylic acids. Bioorg Chem 17:486–489
Article CAS Google Scholar
Wolfenden R, Cullis PM, Southgate CCF (1979a) Water, protein folding, and the genetic code. Science 206:575–577
Article CAS PubMed Google Scholar
Wolfenden R, Andersson L, Cullis PM, Southgate CCF (1979b) Affinities of amino acid side chains for solvent water. Biochemistry 20:849–855
Article Google Scholar
Wolfenden R, Lewis CA, Yuan Y, Carter CW Jr (2015) Temperature dependence of amino acid hydrophobicities. Proc Nat Acad Sci USA 112(24):7484–7488. doi:10.1073/pnas.1507565112
Article CAS PubMed PubMed Central Google Scholar
Yang B, LéJohn HB (1994, February 11) NADP+−activable, NAD+ −specific glutamate dehydrogenase purification and immunological analysis. J Biol Chem 269:4506–4512
CAS PubMed Google Scholar
Yang XL, Schimmel P, Ewalt KL (2004) Relationship of two human tRNA synthetases used in cell signaling. Trends Biochem Sci 29(5):250–256
Article CAS PubMed Google Scholar
Yang X-L, Guo M, Kapoor M, Ewalt KL, Otero FJ, Skene RJ, McRee DE, Schimmel P (2007) Functional and crystal structure analysis of active site adaptations of a potent anti-angiogenic human tRNA synthetase. Structure 15:793–805
Article CAS PubMed PubMed Central Google Scholar
Yarus M (2011a) Life from an RNA world: the ancestor within. Harvard University Press, Cambridge, MA
Google Scholar
Yarus M (2011b) Getting past the RNA world: the initial Darwinian ancestor. Cold Spring Harb Perspect Biol 3:a003590. doi:10.1101/cshperspect.a003590
Article PubMed PubMed Central CAS Google Scholar
Yarus M, Widmann J, Knight R (2009) RNA-amino acid binding: a stereochemical era for the genetic code. J Mol Evol 69:406–429
Article CAS PubMed Google Scholar
Zhang C-M, Perona JJ, Ryu K, Francklyn C, Hou Y-M (2006) Distinct kinetic mechanisms of the two classes of aminoacyl-tRNA synthetases. J Mol Biol 361:300–311. doi:10.1016/j.jmb.2006.06.015
Article CAS PubMed Google Scholar
Zull JE, Smith SK (1990) Is genetic code redundancy related to retention of structural information in both DNA strands? TIBS 15:257–261
CAS PubMed Google Scholar

Download references

Acknowledgments

Research from the Carter laboratory was supported by the National Institutes of General Medical Sciences, GM 78228 and GM40906. I am grateful for the comments of an anonymous referee, and for similar input from E. First.

Conflict of Interest

The author is unaware of any conflicts of interest.

Author information

Authors and Affiliations

Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599-7260, USA
Charles W. Carter Jr

Authors

Charles W. Carter Jr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles W. Carter Jr .

Editor information

Editors and Affiliations

Biochem and Mol Biol, Baylor College of Medicine, Houston, Texas, USA
M. Zouhair Atassi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Carter, C.W. (2017). Coding of Class I and II Aminoacyl-tRNA Synthetases. In: Atassi, M. (eds) Protein Reviews. Advances in Experimental Medicine and Biology(), vol 966. Springer, Singapore. https://doi.org/10.1007/5584_2017_93

Download citation

DOI: https://doi.org/10.1007/5584_2017_93
Published: 22 August 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6921-5
Online ISBN: 978-981-10-6922-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics